Suppose I am trying to sum up one variable (call it var_1
) in a very large dataset (nearly a terabyte). The dataset is both long and wide. My code would look like this:
PROC MEANS DATA=my_big_dataset SUM;
VAR var_1;
RUN;
Would I get any performance gain at all by using the KEEP
option on the dataset being read? That is:
PROC MEANS DATA=my_big_dataset (KEEP=var_1) SUM;
VAR var_1;
RUN;
In terms of disk I/O, I imagine that each record must be read in its entirety no matter what. But perhaps less memory needs to be allocated to read the records. Any advice is appreciated.
Yes it does make a difference. Most of the time it's not a large difference but if you start to have very wide or very long datasets you will start to see some benefit.
Search for keep=
on the link below...
http://support.sas.com/techsup/technote/ts298.html
If you're having performance issues then this may shave fractions of seconds or seconds off what you are doing but it's not going to cut your processing time in half. Look for other optimization techniques if you need that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With