Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split large SAS dataset into smaller datasets

Tags:

split

macros

sas

I need some assistance with splitting a large SAS dataset into smaller datasets.

Each month I'll have a dataset containing a few million records. This number will vary from month to month. I need to split this dataset into multiple smaller datasets containing 250,000 records each. For example, if I have 1,050,000 records in the original dataset then I need the end result to be 4 datasets containing 250,000 records and 1 dataset containing 50,000 records.

From what I've been looking at it appears this will require using macros. Unfortunately I'm fairly new to SAS (unfamiliar with using macros) and don't have a lot of time to accomplish this. Any help would be greatly appreciated.

like image 470
user2941280 Avatar asked Jan 29 '14 18:01

user2941280


3 Answers

Building on Joe's answer, maybe you could try something like this :

%MACRO SPLIT(DATASET);

%LET DATASET_ID = %SYSFUNC(OPEN(&DATASET.));
%LET NOBS = %SYSFUNC(ATTRN(&DATASET__ID., NLOBS));
%LET NB_DATASETS = %SYSEVALF(&NOBS. / 250000, CEIL);

DATA 
  %DO I=1 %TO &NB_DATASETS.;
    WANT&I. 
  %END;;

  SET WANT;

  %DO I=1 %TO &NB_DATASETS.;

    %IF &I. > 1 %THEN %DO; ELSE %END; IF _N_ LE 2.5E5 * &I. THEN OUTPUT WANT&I.;

  %END;
RUN;
%MEND SPLIT;    
like image 87
Karim L Avatar answered Dec 13 '22 09:12

Karim L


You can do it without macros at all, if you don't mind asking for datasets that may not exist, and have a reasonable bound on things.

data want1 want2 want3 want4 want5 want6 want7 want8 want9;
if _n_ le 2.5e5 then output want1;
else if _n_ le 5e5 then output want2;
else if _n_ le 7.5e5 then output want3;
... etc....
run;

Macros would make that more efficient to program and cleaner to read, but wouldn't change how it actually runs in reality.

like image 35
Joe Avatar answered Dec 13 '22 09:12

Joe


You can do it without macros, using CALL EXECUTE(). It creates SAS-code as text strings and then executes it, after your "manually written" code completed.

data _null_;
    if 0 then set have nobs=n;
    do i=1 to ceil(n/250000);
        call execute (cats("data want",i)||";");
        call execute ("set have(firstobs="||(i-1)*250000+1||" obs="||i*250000||");");
        call execute ("run;");
    end;
run;
like image 38
Dmitry Shopin Avatar answered Dec 13 '22 08:12

Dmitry Shopin