Contents

Course Notes | SFU STAT342 - SAS

SAS: Statistical Analysis System

Variables in SAS

SAS has only two types of variables: character and numeric

Tips:

  • Use $ after the character variable
  • Use . to replace the missing values
  • Case INsensitive

Invalid Names:

  • 1_begins_with_a_number
  • contains blanks
  • contains-invalid-characters%

Example 1: Basic Steps in SAS programs

  1. DATA: read/write/manipuplate the data and perform calculations
  • after data step, SAS stores the data in its own special form called a SAS data set
  1. PROC: process SAS datasets in analyzing
  • proc contents shows the descriptor portion of SAS data set
  • proc print shows the data portion in a table
  1. RUN
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/* Example 1: Read the dataset and print */

data Studgrade; /* data step: assign a name for the dataset and enter the data */
input StudID Midterm Final Grade $; /*Variables in the dataset and enter the data*/
datalines;
101 98 86 A
102 49 60 C
103 98 80 A
104 90 98 A+
105 60 80 B+
106 .  80 C-
;
run;  /*end the data step*/

proc print data = studgrade;
/* proc step: perform any analysis and  print the dataset. Here we are only printing the dataset*/
run;


/* View discriptor portion of the dataset 'Studgrade'.*/

/*proc contents data = Studgrade;
run;*/

Week 2: Input Styles (2021-9-16)

infile statement reads the data stored in an external file

  • must execute before the input statement

Four ways to describe values of a dataset in the input statement

Example 2: Infile & List Input

  • raw data is separated by at least one spaces
  • all missing data must be indicated by .
  • $ should be after the character variable
1
2
3
4
data Students;
    infile "...path/data.txt";
    input Major $ Age Height Weight;
run;

Example 3: Column Input

Column Input can be used when data file does not have spaces between values but in fixed columms

  • missing data left blank
1
2
3
4
data Students;
    infile "...path/column_data.txt";
    input Major $ 1-4 Age 5-6 Height 7-9 Weight 10-12;
run;

Example 4: Print Specific Variables with title without observation column

  • noobs
  • var
  • title: global variables
1
2
3
4
5
title 'Data Table';

proc print data = Student noobs;
  var Name Height Weight;
run;

Example 5: Formatted Input using column pointer

@n pointer moves the input pointer to specific column n
+n moves the pointer forward n columns that is relative to the current position

1
2
3
4
5
6
7
8
9
data student;
  input @6 name $11. /* start with col1 an read next 4 cols */
        @27 height 2.
        +5 DOB mmddyy8.
        +1 calorie comma5.;
datalines;
1302 Benedictine Arnold 2 68 190 11/30/95 2,432
;
run;

Example 6: Named Input with “=”

1
2
3
4
5
6
7
8
9
data info;
  input @1 id 4.
        @9 name = $ 6. /* Once the input start to read with Named Input, all remaining values must be read with Named Input */
        @6 height = 2.;
datalines;
1024 height=65 name=Smith
/* index
12345|xxxxx|678|xxx|90123
*/

Example 7: Show the Frequency Table

  • proc freq
  • tables
1
2
3
4
title 'Frequencies by Major`;
proc freq data = Students;
     tables Major;
run;

Example 8: Computes the mean, std, min/max of variables

  • proc means
1
2
3
4
title 'Summary Statistics'
proc means data = Students;
  var Age Weight Height;
run;

Example 9: Compute and Add a new variable

1
2
3
4
5
data Students;
  infle ".../student_data.txt";
  input Height Weight;
  BMI = Weight/(Height/100)**2;
run;

Week 3 - 4. Reading data from external files

Example 10: Read Data from .csv files

dsd option performs serveral functions:

  • changes the default delimiter from a blank to comma,
  • treats two-comma in a row (, ,) as missing value
  • strips “quotes” from the character values
1
2
3
4
5
6
filename CSVfile '...path/data.csv'; /*alias*/

data CSVimport;
  infile CSVfile dsd;
  input Name $ Age Height Weight;
run;

Example 11: Read Data from other delimiters

'09'x represents the TAB in hexadecimal value

1
2
3
4
5
data other_dlm;
infile '.../other_dlm.txt' dlm=':';
/* infile '.../other_dlm.txt' dlm='09'x; */
input Age Height;
run;

Example 12: Reading data values by specifying infile options with the datalines statement

1
2
3
4
5
6
7
data dsd_datalines;
  infile datalines dsd;
  /* dsd could drop the "quote" vs "dlm,=','"*/
  input Age Height Weight;
datalines;
"Black" 50, 68, 155
;

Example 13: Read data using Informats with list input

Colon : enables us to use List Input (blank) with an Informat after a variable name

1
2
3
4
5
6
7
8
data informats_list;
  infile ".../.csv" dsd;
  input subj : 1.
      Name : $19.
      DOB  : mmddyy10.
      Salary: dollar8.;
  format DOB date9. Salary dollar8.;
run;

Uncommon Case: csv file using blanks instead of commas as delimiter

  • & modifier is like the colon, let SAS use supplied informat but the delimiter is now ( >= 2) spaces instead of 1.
  • Therefore, there should be more than one space between Name and DOB
1
2
input Subj : 1.
      Name & $19.

Example 14: Read Data using PROC IMPORT

PROC IMPORT Options:

  • datafile: file path that we want to import
  • out: assign name
  • dbms: option to identify the type of file being imported
    • CSV
    • DLM: for delimited file (default is a blank)
    • JMP
    • TAB: tab-delimited values, .txt
    • XLSX: excel file
  • replace: overwrite the exisiting file
  • getnames: if yes, read the first row of data as variable names
1
2
3
4
5
proc import datafile = "...path/auto.csv"
            out = automob
            dbms = csv replace;
            getnames = yes;
run;

Example 15: Create a new variable after PROC IMPORT

1
2
3
4
data auto;
  set automob; /* Copy all obs from automob dataset*/
  TOT_MILATE = MPG*TANK_VOL; /* Create a new variable column */
run;

Example 16: Read Data from Excel files

1
2
3
4
5
proc import out = auto1 datafile = "path/auto.xlsx"
            dbms = xlsx replace;
            sheet = "auto"; /* specify which sheet of the excel file should be imported */
            getnames = yes;
run;

Example 17: Read Data from URL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
/* filename probly TEMP; */
filename probly 'path/download.csv';

proc http
  url='http://.../probly.csv'
  method="GET"
  out=probly;
run;

proc import file = probly out = probly replace dbms = csv;
run;

proc print data = probly;
run;

Week 5. Working with Dates (2021-10-14)

Example 18: Read Date from Raw

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
data fourdates;
input @1 Subject $3.
      @5 DOB mmddyy10.
      @16 VisitDate mmddyy8.
      @25 TwoDigit mmddyy8.
      @34 LastDate date9.;
format DOB VistDate TwoDigit mmddyy10.
       LastDate date9.;
datalines;
001 10/21/1950 05122003 08/10/65 23Dec2005
002 01/01/1960 11122009 09/13/02 02Jan1960
;
run;

Example 19: Compute the difference between two dates

  • yrdif() computes the difference
    • 'Actual' lets SAS compute the decial number of days between two dates
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data ages;
set fourdates;
Age = yrdif(DOB, visitDate, 'Actual');
/* Age = (visitDate - DOB) / 365 */
run;
title 'Listing of Ages';
proc print data = ages noobs;
  var Subject DOB VisitDate Age;
  format Age 5.1 /*Value of Age in 5 bytes, with one position to the right of the decimal place*/
run;

We can compute the age as an integer

1
2
Age = int(yrdif(DOB, visitDate, 'Actual'));
Age = round(yrdif(DOB, visitDate, 'Actual'));

For Constant Date

  • one/two-digit day
  • three-character month abbreviation
  • four-digit year
  • single/double quotation followed by d
1
Age = yrdif(DOB, '01Jan2006'd, 'Actual');

For Current Date

  • use today()
1
CurAge = int(yrdif(DOB, today(), 'Actual'));

Example 20: Extract the date

There are SAS fuctions to extract the day of the week, day of the month, month, and year from a SAS date.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data extract;
  set four_dates;
  Day = weekday(DOB); /* Sunday: 1, Monday: 2 */
  DayofMonth = day(DOB); /*1-31*/
  Month = month(DOB); /*1-12*/
  Year = year(DOB); /* xxxx */
run;

proc print data = extract noobs;
  var DOB Day -- Year; /* Include all of the variables from Day through Year in the order they are stored in the SAS dataset*/
run;

Example 21: Create a SAS date

  • Use mdy()
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data mdy;
  input Month Day Year;
  Date = mdy(Monty, Day, Year);
  format Date mmddyy10.;

datalines;
10 21 1950
1 15 5
3 . 2005
5 7 2000
;
run;

For Missing date

  • missing()
1
2
3
4
5
6
data substitute;
  set mdy;
  if missing(Day) then Date = mdy(Month,15,Year);
  else Date = mdy(Month,Day,Year);
  format Date worddate.;
run;

Example 22: Compute Date Intervals

INTCK: computes the number of intervals between two dates

  • INTCK('interval', from, to)
1
2
3
4
5
/* _NULL_ dataset is used to execute Data Step without observations*/
data _null_;
  days = intck('day', today(), '25dec2021'd); /* How many days till Chrismas Day. */
  put days; /* Write to the SAS log. i.e. results will be displayed in SAS log. */
run;

INTNX: computes a date after a given number of intervals

  • INTNX('interval', from, to)
1
2
3
4
5
data _null_;
  date = intnx('month', '24dec2017'd, 2, 'sameday');
  format date date9.;
  put date;
run;

Example 23: US Presidential Inaugurations and deaths

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
data year_days;
input @1 Date anydtdte11. /* Informats that is a alternative to other formats such as DATEw. , MMDDYYw. and YYMMDDw. */
      @13 Event $28.;

Prev_date = lag(Date); /* Shift downward Date by one row as a new column */
Years = intck('year', Prev_date, Date, 'continues'); /* number of completed years */
Anniv = intnx('year', Prev_date, Years, 'sameday');

Days = intck('day', Anniv, Date);

Anniv_v2 = put(Anniv, ddmmyy10.); /* Convert variable type*/

format Date Prev_date Anniv Date9.;

datalines;
Apr 30,1789 Washington Inaug
Mar 4, 1797 J Adams Inaug
Dec 14,1799 Washington Death
Mar 4, 1801 Jefferson Inaug
Mar 4, 1909 Madison Inaug
Mar 4, 1917 Moroe Inaug
Mar 4, 1825 JQ Adams Inaug
Jul 4, 1826 Jefferson Death
Jul 4, 1826 J Adams Death;
run;

/* PDF Output */
ods pdf file="...\intck_intnk_out.pdf";
proc print data = year_days;
var Event Date Prev_date -- Anniv_v2;
run;

ods pdf close;

Week 6. Working with time

datetime

SAS datetime for Dec. 06, 1962 at 11:13:04 am equals 92488384 seconds from Jan. 01, 1960 at midnight

Time FORMAT:

  • hhmm. <=> 0:15
  • hour5.2 <=> 0.25 (hours and decimal fractions of hours)
  • time8. <=> 0:15:00
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
/*
 SAS datetime
*/

data _null_;

Time=92488384;
*Time = 1; /* 01JAN60:00:00:01 */
format Time datetime.;
put Time; /* 06DEC62:11:13:04 */
run;

Time Functions

  • time() returns the current time as a SAS time value.
  • hms(h,m,s) returns a SAS time value for the given hour(h), minutes(m), and seconds(s).
  • hour(time) returns the hour portion of a SAS time value(time).
  • minute(time) returns the minute portion of a SAS time value(time).
  • second(time) returns the second portion of a SAS time value(time).
  • intnx( ) and intck( ) that we explored on SAS dates can also be used on SAS times.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
data diet2;
 set diet;

curtime = time(); /*Current time*/
wt_hr = hour(wt_time);
wt_min = minute(wt_time);
wt_sec = second(wt_time);
wt_time2 = hms(wt_hr, wt_min, wt_sec);

format curtime wt_time wt_time2 time8.;

proc print data=diet2;
	title 'The diet data set with five new variables';
    var subj curtime wt_time wt_hr wt_min wt_sec wt_time2;
run;

Export SAS code with output in the log to a PDF file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
/*Export SAS code in the log to a PDF file*/
ods pdf file = "Time_ex_code_.pdf";
proc document name = temp;
   /* imports any text file, or a SAS program */
   import textfile = "/home/u59686016/STAT342W06/Time_ex.sas" to ^ ;
   /* ^ refers [current directory] of the output document */
   replay;
   /* we can replay the document whenever we need it and replay only the output items that we are interested in.
      The document can be replayed using the REPLAY statement*/
run;
proc print data=diet2;
	title 'The diet data set with five new variables';
    var subj curtime wt_time wt_hr wt_min wt_sec wt_time2;
run;
quit;
ods pdf close;

/*Note that when you are exporting SAS code to a PDF, always start with a fresh SAS program. Otherwise, it will print
the same SAS code more than one time(depending on the history) in the same document.*/

Create, Input, Export Permanent SAS datasets

SAS datasets can be temporary or permanent.

  • Temporary SAS datasets only exist during the current SAS session.
    • data Temp; ==> data work.Temp
    • SAS uses the default libref called work that SAS creates automatically every time we open a SAS session.
  • Permanent SAS datasets are saved to a location on the computer and exist after exiting SAS.
    • use LIBNAME statement to create and specify the name of the library followed by the directory or folder where we want to store the permanent dataset.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
libname mylib '/home/u59686016/STAT342W06';
/* The libref must [not be more than 8] charecters in length */

data mylib.test_scores;
  input ID $ Score1-Score3 Name $;

datalines;
1 90 95 98 Cody
2 78 77 75 Preston
3 88 91 92 Russell
;

proc print data = mylib.test_scores;
run;

proc contents data = mylib.test_scores;
run;

/*List all the SAS datasets in a sas library*/

proc contents data = mylib._all_ nods;
/*_all_: all dataset names
  nods: show name of the datasets only, omitting the detail listing for each data set.*/
run;

Input this permanent library and compute an average score for each subject in Test_score data set.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
libname mylib '/home/u59686016/STAT342W06/';

data new;
 set mylib.test_scores;
 AvgScore = mean(of score1-score3);
run;

proc print data=new;
run;

Export this permanent library into files of different formats

  • proc export
  • outfile
  • dbms
  • label
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/*Writing txt file*/

proc export data=mylib.test_scores
      outfile='/home/u59686016/STAT342W06/testscore.txt'
      dbms=dlm;
      delimiter=' ';
run;

/*Writing csv file*/

proc export data=mylib.test_scores
      outfile='/home/u59686016/STAT342W06/testscore.csv'
      dbms=csv replace;
run;

/*Writing a tab separated file*/

proc export data=mylib.test_scores
      outfile='/home/u59686016/STAT342W06/testscore_tabs.txt'
      dbms=tab replace;
run;

/*Writing an excel file*/

proc export data=mylib.test_scores
      outfile='/home/u59686016/STAT342W06/testscore.xlsx'
      dbms=xlsx replace;
      sheet = "test";
run;