Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SAS Proc means: How to capture non default statistics in output dataset such as nmiss p1 p99 etc?

Tags:

sas

Original Question:

By default Proc Means outputs N, MIN, MEAN, MAX and STD in the output dataset. How do I add, NMISS, P1, P5 etc to this list?


Additional info 1:

I want statistics on all numeric variables in my dataset. So I use _numeric_ in the var specification.

I wan't each statistic to be in a row and variables for columns.

 Obs _TYPE_ _FREQ_ _STAT_    var1   var2 var3 etc    
 1     0    84829  N      84826.00
 2     0    84829  MIN        0.00
 3     0    84829  MAX     5000.00
 4     0    84829  MEAN     151.22
 5     0    84829  STD     1989.47
 6     0    84829  NMISS       3
 7     0    84829  P1         2.00
 8     0    84839  P99     4999.00

How do I do this?

Thanks!

like image 893
Zenvega Avatar asked Mar 24 '23 15:03

Zenvega


2 Answers

Assuming you are using the output option in proc means (and not ODS OUTPUT), you can control what comes in that dataset like so:

proc means data=sashelp.class;
var age;
class sex;
output out=mymeans nmiss= P1= P5= /autoname;
run;

The full list of statistic names is available in the PROC MEANS documentation under "statistics keyword".

You can also achieve the same result (with a slightly different output format) with ODS OUTPUT.

ods output summary=mymeans;
ods trace on;
proc means data=sashelp.class nmiss p1 p5;
var age;
class sex;
run;
ods trace off;
ods output close;

ODS TRACE on/off is to show the name of the table created (ie, 'summary'). It's not needed in production. In this case you ask for statistics the same way you ask for them to the output window (in the PROC MEANS statement).

Based on your edits, you want it transposed (one row per statistic). You can't get that directly, but the transposition isn't very hard.

proc means data=sashelp.class nmiss p1 p5;
class sex;
var _numeric_;
output out=mymeans n= mean= nmiss= p1= p5= /autoname ;
run;

data mymeans_out;
set mymeans(drop=_type_ _freq_);
by sex;
array numvars _numeric_;
format var stat $32.;
do _t = 1 to dim(numvars);
 var=scan(vname(numvars[_t]),1,'_');
 stat=scan(vname(numvars[_t]),-1,'_');
 value = numvars[_t];
 output;
end;
keep sex var stat value;
run;

This has a few limitations. If your variable names have underscores in them already, the var=scan... line will need to be rewritten to use substr and find the last underscore, then var = substr(vname(...),1,position_of_last_underscore). Stat should be fine since it uses -1 (reverse direction). If your variable names might exceed ~23 characters, you may not get the exact variable name back out again as it may be truncated or modified. If that's the case, then the ODS OUTPUT solution from above will help you (as it provides in an additional column the name of the original variable), but some more work would be needed to relate that value to the truncated name.

I also drop _TYPE_ and _FREQ_, to simplify the array definition; if you need those, then you'd need to write a bit of code to exclude them from separately being output, and keep them.

like image 61
Joe Avatar answered Apr 29 '23 18:04

Joe


This paper has an excellent discussion of the exact issue you describe, along with macro code to output a dataset fitting your description.

A Better Means — The ODS Data Trap

Update: I've discovered that there is a more recent paper that "presents a revised version of the macro supporting additional features and eliminating a surprising error." This is the updated solution:

Solve the SAS® ODS Data Trap in PROC MEANS

The macro appears well designed and avoids a wide variety of possible issues. The contortions used to create the output dataset involve calls to proc means (of course), proc sql, proc contents, and proc datasets and extensive use of the macro language architecture, and a description of them would probably not be instructive in this answer. I don't claim to understand it entirely myself.

However, once you have compiled the macro you should be able to create your desired dataset with one simple statement.

%better_means(data=MyDataSet)

Now that I've found this convenient solution I may start to use it myself.

like image 27
Michael Richardson Avatar answered Apr 29 '23 19:04

Michael Richardson