Original Question:
By default Proc Means outputs N, MIN, MEAN, MAX and STD in the output dataset. How do I add, NMISS, P1, P5 etc to this list?
Additional info 1:
I want statistics on all numeric variables in my dataset. So I use _numeric_
in the var specification.
I wan't each statistic to be in a row and variables for columns.
Obs _TYPE_ _FREQ_ _STAT_ var1 var2 var3 etc
1 0 84829 N 84826.00
2 0 84829 MIN 0.00
3 0 84829 MAX 5000.00
4 0 84829 MEAN 151.22
5 0 84829 STD 1989.47
6 0 84829 NMISS 3
7 0 84829 P1 2.00
8 0 84839 P99 4999.00
How do I do this?
Thanks!
Assuming you are using the output option in proc means (and not ODS OUTPUT), you can control what comes in that dataset like so:
proc means data=sashelp.class;
var age;
class sex;
output out=mymeans nmiss= P1= P5= /autoname;
run;
The full list of statistic names is available in the PROC MEANS documentation under "statistics keyword".
You can also achieve the same result (with a slightly different output format) with ODS OUTPUT.
ods output summary=mymeans;
ods trace on;
proc means data=sashelp.class nmiss p1 p5;
var age;
class sex;
run;
ods trace off;
ods output close;
ODS TRACE on/off is to show the name of the table created (ie, 'summary'). It's not needed in production. In this case you ask for statistics the same way you ask for them to the output window (in the PROC MEANS statement).
Based on your edits, you want it transposed (one row per statistic). You can't get that directly, but the transposition isn't very hard.
proc means data=sashelp.class nmiss p1 p5;
class sex;
var _numeric_;
output out=mymeans n= mean= nmiss= p1= p5= /autoname ;
run;
data mymeans_out;
set mymeans(drop=_type_ _freq_);
by sex;
array numvars _numeric_;
format var stat $32.;
do _t = 1 to dim(numvars);
var=scan(vname(numvars[_t]),1,'_');
stat=scan(vname(numvars[_t]),-1,'_');
value = numvars[_t];
output;
end;
keep sex var stat value;
run;
This has a few limitations. If your variable names have underscores in them already, the var=scan...
line will need to be rewritten to use substr and find the last underscore, then var = substr(vname(...),1,position_of_last_underscore)
. Stat should be fine since it uses -1 (reverse direction). If your variable names might exceed ~23 characters, you may not get the exact variable name back out again as it may be truncated or modified. If that's the case, then the ODS OUTPUT solution from above will help you (as it provides in an additional column the name of the original variable), but some more work would be needed to relate that value to the truncated name.
I also drop _TYPE_
and _FREQ_
, to simplify the array definition; if you need those, then you'd need to write a bit of code to exclude them from separately being output, and keep them.
This paper has an excellent discussion of the exact issue you describe, along with macro code to output a dataset fitting your description.
A Better Means — The ODS Data Trap
Update: I've discovered that there is a more recent paper that "presents a revised version of the macro supporting additional features and eliminating a surprising error." This is the updated solution:
Solve the SAS® ODS Data Trap in PROC MEANS
The macro appears well designed and avoids a wide variety of possible issues. The contortions used to create the output dataset involve calls to proc means
(of course), proc sql
, proc contents
, and proc datasets
and extensive use of the macro language architecture, and a description of them would probably not be instructive in this answer. I don't claim to understand it entirely myself.
However, once you have compiled the macro you should be able to create your desired dataset with one simple statement.
%better_means(data=MyDataSet)
Now that I've found this convenient solution I may start to use it myself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With