Would overwriting the existing SAS dataset take more time?

Tags:

sas

I got a short question - If we are creating a SAS dataset say - Sample.sas7bdat which already exists, will the code take more time to execute (because here the code has to overwrite the existing dataset) than the case when this dataset was not already there?

data sample;
.....
.....
run;

I did some reasearch on the internet but could not find a satisfactory answer. To me it seems like the code should take a little bit extra time, though not sure how much of impact it would make on a 10GB of dataset.

869

asked Dec 08 '14 10:12

in_user

1 Answers

You could test this yourself fairly easily. A few caveats:

Make sure you have a large enough dataset such that you won't miss the differences in simple random cpu activity. 100+MB is usually a good target.
Make sure you perform the test multiple times - the more the better, with no time in between if possible. One test will always be insufficient and will always tend to show the first dataset as faster, because it benefits from write caching (basically the OS saying that it's done writing when it's not, but simply has the write queued up in memory).

Here's an example of my test. This is a 100 million row dataset with two 8 byte numerics, so 1.6 GB.

First, the results. I see a few second difference. Why? SAS takes a few operations when replacing a dataset:

Write dataset to temporary file
Delete the old dataset
Rename temporary dataset to new dataset

On some OSs this seems to be faster than others; I've found Windows desktop to be fairly slow about this, compared to unix or even Windows Server OS which is pretty quick. I'm guessing Windows is more careful about deleting than simply changing a file system pointer, but I don't really know. It's certainly not copying the whole file over from the utility directory (it's not nearly enough time for that). I also suspect write caching is still giving a bit of a boost to the new datasets, particularly as time for all datasets is growing as I write. The difference is probably only about a second or so - the difference between _REP iteration 2 and _NEW iteration 3 seems the most reasonable to me.

Iteration 1 _NEW=7.26999998099927 _REP=12.9079999922978
Iteration 2 _NEW=10.0119998454974 _REP=11.0789999961998
Iteration 3 _NEW=10.1360001564025 _REP=15.3819999695042
Iteration 4 _NEW=14.7720000743938 _REP=17.4649999142056
Iteration 5 _NEW=16.2560000418961 _REP=19.2009999752044

Notice the first iteration new is far faster than the others, and overall time increases as you go (as the write caching is less and less able to keep up). I suspect if you allow it to continue (or use a still larger file, which I don't have time for right now) you might see even more consistent times. I'm also not sure what happens with write caching when a file that is write cached is deleted; it's possible it has to wait for the write caching to write out to disk before doing the delete op or something similar. You could perform a test where you waited 30 seconds between _NEW and _REP to verify that.

The code:

%macro test_me(iter=1);
%do _i=1 %to &iter.;
%let start = %sysfunc(time());
data test&_i.;
  do x = 1 to 1e8;
    y=x**2;
    output;
  end;
run;
%let mid=%sysfunc(time());
data test&_i.;
  do x = 1 to 1e8;
    y=x**2;
    output;
  end;
run;
%let end=%sysfunc(time());
%let _new = %sysevalf(&mid.-&start.);
%let _rep = %sysevalf(&end.-&mid.);

%put Iteration &_i. &=_new. &=_rep.;
%end;

proc datasets nolist kill;
quit;
%mend test_me;

options nosource nonotes nomprint nosymbolgen;

%test_me(iter=5);

answered Oct 07 '22 12:10

Joe

Related questions
                            
                                How to pad out character fields in SAS?
                            
                                Reading a period as character value in SAS
                            
                                Could not allocate a new page for database because of insufficient disk space in filegroup
                            
                                converting format (from date to numeric) using SAS
                            
                                Hotkey to run program in SAS?
                            
                                SAS : Convert character to numeric without creating another variable
                            
                                SAS - How to get last 'n' observations from a dataset?
                            
                                convert numeric sas date to datetime in Pandas
                            
                                Treat missing values as zero in SAS where clause
                            
                                sas informat datetime
                            
                                Observation number by group [duplicate]
                            
                                reading next observation's value in current observation
                            
                                read.sas7bdat unable to read compressed file
                            
                                How to convert String to Date value in SAS?
                            
                                Using SAS Macro to pipe a list of filenames from a Windows directory
                            
                                SAS Syntax Highlighting in Sublime
                            
                                SAS proc export to CSV: how to add double quotes
                            
                                Plotting a shape file with ggplot2 error
                            
                                Sorting an almost sorted dataset in SAS
                            
                                Is it possible to do a case-insensitive DISTINCT with SAS (PROC SQL)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Would overwriting the existing SAS dataset take more time?

Tags:

sas

in_user

People also ask

1 Answers

Joe

Recent Activity

Donate For Us