A SAS program is composed of two parts: data steps that deal with data cleaning and data format, and procedures that perform required statistical analyses and/or graphically present the results. Data steps are important for several reasons. First, the dataset may not be in a SAS compatible format, although this is usually not the case for the datasets in class examples or exercises. Second, sometimes you need to extract some of the variables or some of the observations from the dataset to perform analysis. Third, different procedures may require the same dataset in different format. A data step is needed to transform the dataset into the appropriate format for a procedure.
Mathematical operations are listed in the following table:
Manipulating variables in a data step (recoding, if/then statements)
To illustrate the data manipulation, let’s take a sample data set:
data a1;
input gender $ age weight;
cards;
M 13 143
M 16 132
F 19 140
M 20 120
M 15 110
F 18 95
F 22 105
;
Suppose you want a data set of females only. The following SAS code will create a new data set call aa and store those observations whose value for the variable gender is not ‘M’. The set a1 statement after the data aa statement tells SAS to make a copy of the dataset a1 and save it as aa. The if/then statement deletes the observations in dataset aa whose gender variable has a value ‘M’. Quotation marks are used on M because gender is a categorical variable. The dollar sign ($) is used when you have a text variable rather than a numerical variable (i.e., gender coded as M, F rather than as 1 denoting male and 2 denoting female).
data aa;
set a1;
if gender eq 'M' then delete;
5
or
if gender eq 'F';
run;
If you want to include those who are 16 years or older, you can do:
data ab;
set a1;
if age lt 16 then delete;
run;
You can also select variables from a dataset for analysis. The statement is keep or drop. For example, if you do not need the variable age in your analysis, you can do:
data ac;
set a1;
drop age;
or
data ac;
set a1;
keep gender weight;
This last statement will create a dataset that only contains the two variables specified, gender and weight.
Share Path
DW Blogs
Big Data Blog
BI Portal
ETL Process
DW Basics
DW Comparisons
Ab Initio Blog
1010data Blog
Actuate Blog
Autosys Blog
BO Blog
Cognos Blog
DataStage Blog
Hadoop Blog
Informatica Blog
Greenplum Blog
MapReduce Blog
MicroStrategy Blog
Netezza Blog
Oracle Blog
Pig Blog
QlikView Blog
SAS Blog
Teradata Blog
WebFOCUS Blog
Zookeeper Blog
BI Portal
ETL Process
DW Basics
DW Comparisons
Ab Initio Blog
1010data Blog
Actuate Blog
Autosys Blog
BO Blog
Cognos Blog
DataStage Blog
Hadoop Blog
Informatica Blog
Greenplum Blog
MapReduce Blog
MicroStrategy Blog
Netezza Blog
Oracle Blog
Pig Blog
QlikView Blog
SAS Blog
Teradata Blog
WebFOCUS Blog
Zookeeper Blog