I have a data temp
that contains variables A1, A2, ... Amax. I want to rearrange its internal order so that once we open it, it will show A2, A5, .....
I know there's couple of ways to do this. What I usually to is to use retain
statement.
If the number of observation is large (N>1,000,000), what's the most efficient way to finish this? A data step with retain
or proc sql
or something else?
The most efficient means the least processing time for me. I will appreciate if you can also provide the analysis of the memory and disk space needed for each method.
A couple of years ago I attended a SAS conference at one of their main offices in the UK. They held a workshop very similar to your question where they looked into the speed of different techniques of reordering and merging/joining datasets.
The 3 ways which SAS presented where:
Traditional Datastep (Retain)
Proc SQL (Create Table)
Hash Tables (Specifically around merging tables not necessarily re-ordering)
The interesting outcome was that unless you're talking about a very large dataset the retain and create table were evenly matched.
Obviously if you want to merge/join and re-order then proc sql is the way to go as using a data step to merge requires you to sort first, whereas a proc sql doesn't. And if it really is big, Hash tables can save 90% processing time on merges/joins.
One of the other outcomes as part of the group discussion is when using large datasets the improved IO performance of Views when re-ordering:
proc sql noprint;
create view set2 as
select title, *
from set1;
quit;
** OR;
data set2 / view=set2;
retain title salary name;
set set1;
run;
(Referenced from here: http://www2.sas.com/proceedings/sugi27/p019-27.pdf)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With