Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The most efficient way to reorder columns in SAS

Tags:

sorting

sas

I have a data temp that contains variables A1, A2, ... Amax. I want to rearrange its internal order so that once we open it, it will show A2, A5, .....

I know there's couple of ways to do this. What I usually to is to use retain statement.

If the number of observation is large (N>1,000,000), what's the most efficient way to finish this? A data step with retain or proc sql or something else?

The most efficient means the least processing time for me. I will appreciate if you can also provide the analysis of the memory and disk space needed for each method.

like image 723
Lovnlust Avatar asked Feb 11 '23 23:02

Lovnlust


1 Answers

A couple of years ago I attended a SAS conference at one of their main offices in the UK. They held a workshop very similar to your question where they looked into the speed of different techniques of reordering and merging/joining datasets.

The 3 ways which SAS presented where:

  • Traditional Datastep (Retain)

  • Proc SQL (Create Table)

  • Hash Tables (Specifically around merging tables not necessarily re-ordering)

The interesting outcome was that unless you're talking about a very large dataset the retain and create table were evenly matched.

Obviously if you want to merge/join and re-order then proc sql is the way to go as using a data step to merge requires you to sort first, whereas a proc sql doesn't. And if it really is big, Hash tables can save 90% processing time on merges/joins.

One of the other outcomes as part of the group discussion is when using large datasets the improved IO performance of Views when re-ordering:

proc sql noprint;
  create view set2 as
  select title, *
  from set1;
quit;

** OR;

data set2 / view=set2;
  retain title salary name;
  set set1;
run;

(Referenced from here: http://www2.sas.com/proceedings/sugi27/p019-27.pdf)

like image 131
jaymarvels Avatar answered Feb 19 '23 01:02

jaymarvels