A SAS program is composed of one or more (statistical) procedures. Each procedure is a unit, although some are needed to run others. Some often-used procedures for statistical analysis are explained in detail.__Proc print__

The output of this procedure is the data set that you specified by writing data=dataname option after the print key word. This data= option is common for almost every SAS procedure. It is a good habit to use this option all the time so that you know with which dataset you are working. This is helpful especially when there are multiple datasets, which is usually the case when you are performing statistical analysis using SAS. Here’s an example of how proc print works. In the data step section, we created a data set called a1 with three variables (gender, age, weight), and seven observations. It’s a good idea to always check if SAS has read in your dataset correctly before performing any analyses on the data.

proc print data=a1;

run;

If you highlight this section of code and click on the run button, you’ll see the dataset in the output window

as follows:

Obs gender age weight

1 M 13 143

2 M 16 132

3 F 19 140

4 M 20 120

5 M 15 110

6 F 18 95

7 F 22 105

6

If you want to see only some variables in the data set, you could add a statement after the proc print line in the format of var gender age;. This would generate output similar to that shown above except the weight variable would not be included.__Proc univariate__

It is one of the most important procedures for elementary statistical analysis. It outputs the basic statistics of one or more variables, and has optional statements to generate qqplots and histograms. Sample code follows:

proc univariate data=a1;

var weight;

qqplot;

histogram;

run;

The var statement is optional. Without this statement, a univariate analysis is performed for all numeric variables in the order they appear in the dataset.__Proc capability__

It has a variety of functions including creating a normal qq plot, histogram, and probability plots, although it is often used to create a normal qq plot in elementary statistical analysis. A normal qq plot and a histogram can be created using the code in the univariate example, just replacing univariate with

capability.__Proc sort__

Proc sort sorts the observations in a dataset by some variables in either ascending or descending order. Forexample:

proc sort data=a1 out=a2;

by gender;

run;

The observations of dataset a1 are sorted in ascending order, by default, of the variable gender, and the sorted data is saved in a dataset named a2. Without the out=a2 option, the unsorted dataset named a1 will be replaced by the sorted dataset. You can also sort the observations in the descending order of some

variable by specifying the descending option in the by statement, e.g. by gender descending. If you need to sort by more than one variable, list all the variables in the by statement. For example, by gender age will sort in the ascending order by gender, and then the observations with the same gender value will be sorted in the ascending order by the values of age.__Proc means__

This procedure produces simple univariate descriptive statistics for numeric variables. It also calculates confidence limits for the mean, and identifies extreme values and quartiles. Here’s an example for mean and its confidence limit calculation:

proc means data=a2 alpha=0.05 clm mean median n min max;

run;

The mean, median, sample size, minimal value, maximal value, and 95% confidence intervals will be computed for variables age and weight. The alpha option specifies the confidence level for the confidence limit, clm tells SAS to calculate the confidence interval of the mean. Since gender is a categorical variable, no mean will be computed for it.

If you have a lot of variables and you only want to calculate the mean for some of them, use the var option and list the variables after the keyword var. If you want the means of the variables by group, use the by option. For example,

proc means data=a2 alpha=0.05 clm mean;

var weight;

by gender;

run;

tells SAS to compute the mean and confidence interval of weight for each value of gender, i.e. male and female. If the by statement is used, the observations need to be sorted by the same variable before the

proc means procedure. Note data a2, the sorted dataset, was used in our proc means example.__Proc summary__

It computes descriptive statistics on numeric variables in a SAS dataset and outputs the results to a new SAS dataset. The syntax of proc summary is the same as that of proc means. An example follows:

proc summary data=a2 print;

var weight;

by gender;

output out=3;

run;

Proc summary will not run without either the print option or the output statement.__Proc corr__

This procedure is used for calculating the correlation between numeric variables. For example, the Pearson correlation coefficient and its P-value can be computed.

proc corr data=a1;

var age weight;

run;

A correlation coefficient matrix is created:

Pearson Correlation Coefficients, N = 7

Prob > |r| under H0: Rho=0

age weight

age 1.00000 -0.43017

0.3354

weight -0.43017 1.00000

0.3354

The correlation coefficient between age and weight in this example is -0.43017, and 0.3354 is the P-value for testing the null hypothesis that the coefficient is zero. In this case, the P-value is greater than 0.05, and the null hypothesis of zero coefficient cannot be rejected. __Proc glm__

It performs simple and multiple regression, analysis of variance (ANOVA), analysis of covariance, multivariate analysis of variance, and repeated measures analysis of variance.

proc glm data=a1;

model weight=age;

output out=a3 p=pred r=resid;

run;

performs a simple linear regression with weight as the dependent variable and age the independent variable. The predicted values of weight (the dependent variable) and the residuals are saved in a new dataset called a3 using the output statement. For multiple regression where you have more than one independent

variable, simply list in the model statement all the variables on the right hand side of the equal sign with one space in between, e.g.

model weight=age height;

In the case of ANOVA, a class statement is needed for categorical variables before the model statement.The following code is an ANOVA analyzing the effect of gender on weight. It tests whether the weight is the same for females and males.

proc glm data=a1;

class gender;

model weight=gender;

run;__Proc reg__

Proc reg is a procedure for regression. It is capable of more regression tasks than proc glm. It allows multiple model statements in one procedure, can do model selection, and even plots summary statistics and normal qq-plots.You can specify several PLOT statements for each MODEL statement, and you can specify more than one plot in each PLOT statement.

proc reg data=a1;

model weight=age;

plot weight*age;

plot predicted.*age;

plot residual.*age;

plot nqq.*residual.;

run;

In the above example, a simple regression is performed with weight as the response and age as the explanatory variable. The plot statements request four plots: weight versus age, predicted values of weight versus age, residuals versus age, and normal qq plot versus residuals. Predicted., residual., and nqq. are keywords that SAS recognizes. Make sure you keep a dot after the word.

## DW Blogs

Big Data Blog

BI Portal

ETL Process

DW Basics

DW Comparisons

Ab Initio Blog

1010data Blog

Actuate Blog

Autosys Blog

BO Blog

Cognos Blog

DataStage Blog

Hadoop Blog

Informatica Blog

Greenplum Blog

MapReduce Blog

MicroStrategy Blog

Netezza Blog

Oracle Blog

Pig Blog

QlikView Blog

SAS Blog

Teradata Blog

WebFOCUS Blog

Zookeeper Blog

BI Portal

ETL Process

DW Basics

DW Comparisons

Ab Initio Blog

1010data Blog

Actuate Blog

Autosys Blog

BO Blog

Cognos Blog

DataStage Blog

Hadoop Blog

Informatica Blog

Greenplum Blog

MapReduce Blog

MicroStrategy Blog

Netezza Blog

Oracle Blog

Pig Blog

QlikView Blog

SAS Blog

Teradata Blog

WebFOCUS Blog

Zookeeper Blog