I want to sort data in SAS data step. What exactly I mean is: the work of proc sort should be done in data step. Is there any solution?
No, you can't sort your data in a Data Step. With a Data Step, you can only create new datasets and new variables, filter observations, and select columns. You need the PROC SORT procedure to sort your data.
You can specify a sort order for each column in the ORDER BY clause. libname sql 'SAS-library'; proc sql outobs=12; title 'World Topographical Features'; select Name, Type from sql. features order by Type desc, Name; Note: The ASC keyword is optional because the PROC SQL default sort order is ascending.
What Does the SORT Procedure Do? The SORT procedure orders SAS data set observations by the values of one or more character or numeric variables. The SORT procedure either replaces the original data set or creates a new data set. PROC SORT produces only an output data set.
You can control the order in which variables are displayed in SAS output by using the ATTRIB statement. Use the ATTRIB statement prior to the SET, MERGE, or UPDATE statement in order for you to reorder the variables. Variables not listed in the ATTRIB statement retain their original position.
If you're looking for a data-step only solution, you can do the work of PROC SORT
with a hash table. The caveat is that you need enough memory to do it.
If you want to do a simple sort, you would load the hash table with the ordered:'yes'
option and output it to a new table. By default, ordered:yes
will sort the data in ascending order. You can specify descending
as well.
Simple Sort
data _null_;
/* Sets up PDV without loading the table */
if(0) then set sashelp.class;
/* Load sashelp.class into memory ordered by Height. Do not remove duplicates. */
dcl hash sortit(dataset:'sashelp.class', ordered:'yes', multidata:'yes');
sortit.defineKey('Height'); * Order by height;
sortit.defineData(all:'yes'); * Keep all variables in the output dataset;
sortit.defineDone();
/* Output to a dataset called class_sorted */
sortit.Output(dataset:'class_sorted');
run;
De-duping
To remove duplicates, do the exact same operation, except remove the multidata
option. In the below table, observations (8, 9) and (15, 16) are duplicates of each other. Observations 9 and 16 will be eliminated.
data _null_;
/* Sets up PDV without loading the table */
if(0) then set sashelp.class;
/* Load sashelp.class into memory ordered by Height. Do not keep duplicates. */
dcl hash sortit(dataset:'sashelp.class', ordered:'yes');
sortit.defineKey('Height'); * Order by height;
sortit.defineData(all:'yes'); * Keep all variables in the output dataset;
sortit.defineDone();
/* Output to a dataset called class_sorted */
sortit.Output(dataset:'class_sorted');
run;
Stu beat me to it, but provided that your dataset contains a unique key, and you can fit the whole thing in memory, you can use a hash sort, e.g.:
data _null_;
if 0 then set sashelp.class;
declare hash h(dataset:"sashelp.class",ordered:"a");
rc = h.definekey("age","sex","name");
rc = h.definedata(ALL:'yes');
rc = h.definedone();
rc = h.output(dataset:"class_sorted");
stop;
run;
If you are really determined to avoid using any built-in sort methods, a particularly silly approach is to load the whole dataset into a series of temporary arrays, sort the arrays using a hand-coded algorithm, and export again:
https://codereview.stackexchange.com/questions/79952/quicksort-in-sas-for-sorting-datasets
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With