SAS TUTORIALS

Best Online SAS Tutorial Tutorials for Beginners

Big Data Analytics

What is an option or statement?

A statement is a command nested within the procedure commands that tells SAS a bit more about the procedure you want to perform or in some cases, allows you to make your analysis more specific. An option is something that even further describes a statement, or in some cases, it may also further describe a
procedure. Some statements are necessary while others are optional

The Var Statement
In many of the above SAS procedures, a var statement is either required or may be needed if you are dealing with a large data set with many variables. For example, if you are using the proc corr procedure (outlined above), you may want to tell SAS which variables in your dataset you are interested in obtaining
correlations for. It would work as follows if you had three variables for which you needed to obtain the correlations:

proc corr data=yourdatasetname;
var V1 V2 V3;
run;

If you have a dataset with many variables, but you only want to check normality assumptions for a few of
them, use:

proc univariate data=yourdataset;
var response1 response2;
run;


The By Statement
The by statement is required for the proc sort procedure. After using it in proc sort, you can then use it in other procedures. For example, say you were interested in performing regressions of height on weight by gender. First, you would want to sort your dataset by gender as follows:

proc sort data=yourdataset;
by gender;
run;

Then, you can use the sorted data to obtain two separate regressions, one for males and one for females as
follows:

proc reg data=yourdataset;
model weight=height;
by gender;
run;

(We will get to the model statement shortly!)


The Class Statement
The class statement tells SAS that you have a variable in your data set that is categorical. For example, if you had data from an experiment with 20 subjects where five subjects received treatment 1, five received treatment 2, five received treatment 3, and the final five received treatment 4, treatment would be
considered a categorical variable, and thus must appear in the class statement of the glm procedure. The most common usage of the class statement for you will most likely be in the univariate, means, and glm procedures. It is required for the glm procedure only if you have a categorical variable such as gender. The coding of the above example could look as follows

proc glm data=yourdataset;
class treatment;
model resp=treatment;
run;

where resp is the response for each of the 20 subjects.


The Model Statement
By now, you have already seen the model statement in a few of the above examples. The model statement tells SAS which model you would like to use for your data. The dependent or response variable always goes on the left of the equals sign while the independent variable(s) come after the equals sign on the right.

The above glm example shows how the model statement works. For the procedure statements you have learned thus far, the model statement is only required (and accepted) in the glm and reg procedures. The model statement also supports many options in both glm and reg. For example, in the glm model statement, options exist for choosing the types of sums of squares and asking for confidence and prediction intervals. In proc reg, the model statement has options for these same things, plus many other options such
as standard errors for the regression coefficients, step-wise regression and specialized regression
diagnostics. An example of how to use options in the model statement is as follows:

proc reg data=yourdataset;
model weight=height / stb;
run;

(following the earlier example of weight and height). You must always use the forward slash to tell SAS that there are options coming after the model statement. You can use as many options as you need in one model statement, but just make sure that all of them are separated by one space. The option stb asks for the standardized regression coefficients.

The Means and Lsmeans Statements
Often in an analysis, once differences are found among groups, we would like to see exactly where those differences occur; this is done in SAS by the use of the means and lsmeans statements in proc glm or proc reg. Both the means and lsmeans statements can be used in conjunction with a variety of options. If you
have no missing values in your data set, your design is a balanced one and you use no covariates, you can use the means statement. However, if missing values exist or there is an imbalance in your design, or you have covariates on your model, you must use lsmeans to obtain the proper means and comparisons. An
example follows:

proc glm data=yourdataset;
class treatment;
model resp=treatment;
means treatment / lines tukey bon;
run;

The means statement will perform means comparisons for all four treatment groups in this case. The options lines, Tukey, and Bon are used. The lines option displays the means comparisons in a more readable format. The Tukey and Bonferroni options correspond to two types of means comparisons procedures. Many other options for different means comparison procedures also exist (i.e. Dunnett, least squared differences, Duncan, Scheffe, Student-Newman-Kuels). When using the lsmeans statement, the syntax is a bit different. lsmeans treatment / adj=tukey stderr; When using lsmeans, you must use the “adj=” option to obtain Tukey and Bonferroni comparisons, for example. The stderr option gives the standard errors for the least squares (ls) means.Options in the Procedures
Some options contained in the procedures come not in the model or the means statements, but directly after the proc statement. An example of this is:

proc glm data=yourdataset alpha=.05;
class treatment;
model resp=treatment;
means treatment / lines tukey bon;
run;

In this example, it becomes apparent that the “data=” option is really an option in the procedures statement. The alpha=.05 option tells SAS that for any confidence intervals, significance testing, etc. you want an alpha of .05. (This option is such that any tests in the model statement, lsmeans, means, and any confidence intervals outputted with the output statement are performed at the .05 level). Another useful example of options in the proc statement is with proc univariate. By using options in the procedures statement, you can obtain stem-and-leaf plots, normal probability plots, boxplots, and tests for
normality.

proc univariate data=yourdataset normal plot;
var response1 response2;
run;

The normal option gives the Shapiro-Wilks test of normality, while the plot option produces the stem-andleaf plot, boxplot, and normal probability plot.
Output Statements (used in many procedures) How does the output statement normally work? The basic function of the output statement is to create a new dataset containing both the information in the old dataset plus any new diagnostics or statistics that the procedure has created. For example, if you
specify a dataset for your reg procedure, you may want to output that dataset along with predicted values and residual values. Options for obtaining predicted values, residual values, and other statistics and diagnostics This is how it works:

proc reg data=one;
model response=var1 var2;
output out=two r=res p=pred;
run;

So, now you have a data set named “two” which contains everything that dataset one contains, plus the predicted and residual values from your proc reg model. Now, you can make diagnostic plots as follows:

proc gplot data=two;
plot res*pred;
plot res*var1;
plot res*var2;
run;

These plots can help to assess normality, independence of observations, and constancy of variance. There are many other options besides residual and predicted values depending on which procedure you are using for your analysis. By looking in the SAS help menu, you can find the keywords (e.g., for residuals, the keyword is just r=) for other diagnostics such as Cook’s distance, standard errors, prediction, etc.
Another example of an output statement used with the proc univariate statement:

proc univariate normal plot data=old;
var y1;
output out=new max=maximum min=minimum mean=mean;
run;

This will give the mean, maximum, and minimum values for y1 in the data set “new”. Note that max, min, and mean are how SAS recognizes that you are asking for these values. What comes after the equals sign (=) is whatever YOU choose to name that new value or variable. How can I be sure that correct values and variables were output? The best way to assess whether your output statement worked is to use the proc print procedure as follows (building from the univariate example above):

proc print data=new;
run;

This will print out all variables and values in your new data set. How does SAS know which dataset to use?
If you are working with multiple datasets that you have output from multiple procedures (e.g., you have one data set that SAS made from a proc glm and another from a proc reg), you must always name the data set you wish to use, otherwise SAS will use the dataset just previously used by default.

Click
For Special
Download