I've also posted this question on runsubmit.com, a site outside the SE network for SAS-related questions.
At work there are 2 sas servers I use. When I transfer a sas dataset from one to the other via proc upload, it goes at about 2.5MB/s. However, if I map the drive on one server as a network drive and copy and paste the file across, it runs much faster, around 80MB/s (over the same gigabit connection).
Could anyone suggest what might be causing this and what I can do either to fix it or as a workaround?
There is also a third server I use that cannot map network drives on the other two- SAS is the only available means of transferring files from that one, so I need a SAS-based solution. Although individual transfers from this one run at 2.5MB/s, I've found that it's possible to have several transfers all going in parallel, each at 2.5MB/s.
Would SAS FTP via filenames and a data step be any faster than using proc upload? I might try that next, but I would prefer not to use this- we only have SAS 9.1.3, so SFTP isn't available.
Update - Further details:
Potential solutions
Update - test results
*I initially tried the following:
local session -> remote session on source server -> n remote sessions on destination server -> Recombine n pieces on destination server
Although this resulted in n simultaneous transfers, they each ran at 1/n of the original rate, probably due to a CPU bottleneck on the source server. To get it to work with n times the bandwidth of a single transfer, I had to set it up as:
local session -> n remote sessions on source server -> 1 remote session each on destination server -> Recombine n pieces on destination server
SAS FTP code
filename source ftp '\dir1\dir2'
host='servername'
binary dir
user="&username" pass="&password";
let work = %sysfunc(pathname(work));
filename target "&work";
data _null_;
infile source('dataset.sas7bdat') truncover;
input;
file target('dataset.sas7bdat');
put _infile_;
run;
My understanding of PROC UPLOAD is that it is performing a record-by-record upload of the file along with some conversions and checks, which is helpful in some ways, but not particularly fast. PROC COPY, on the other hand, will happily copy the file without being quite as careful to maintain things like indexes and constraints; but it will be much faster. You just have to define a libref for your server's files.
For example, I sign on to my server and assign it the 'unix' nickname. Then I define a library on it:
libname uwork server=unix slibref=work;
Then I execute the following PROC COPY code, using a randomly generated 1e7 row datafile. Following that, I also RSUBMIT a PROC UPLOAD for comparison purposes.
48 proc copy in=work out=uwork;
NOTE: Writing HTML Body file: sashtml.htm
49 select test;
50 run;
NOTE: Copying WORK.TEST to UWORK.TEST (memtype=DATA).
NOTE: There were 10000000 observations read from the data set WORK.TEST.
NOTE: The data set UWORK.TEST has 10000000 observations and 1 variables.
NOTE: PROCEDURE COPY used (Total process time):
real time 13.07 seconds
cpu time 1.93 seconds
51 rsubmit;
NOTE: Remote submit to UNIX commencing.
3 proc upload data=test;
4 run;
NOTE: Upload in progress from data=WORK.TEST to out=WORK.TEST
NOTE: 80000000 bytes were transferred at 1445217 bytes/second.
NOTE: The data set WORK.TEST has 10000000 observations and 1 variables.
NOTE: Uploaded 10000000 observations of 1 variables.
NOTE: The data set WORK.TEST has 10000000 observations and 1 variables.
NOTE: PROCEDURE UPLOAD used:
real time 55.46 seconds
cpu time 42.09 seconds
NOTE: Remote submit to UNIX complete.
PROC COPY is still not quite as fast as OS copying, but it's much closer in speed. PROC UPLOAD is actually quite a bit slower than even a regular data step, because it's doing some checking; in fact, here the data step is comparable to PROC COPY due to the simplicity of the dataset (and probably the fact that I have a 64k block size, meaning that a data step is using the server's 16k block size while PROC COPY presumably does not).
52 data uwork.test;
53 set test;
54 run;
NOTE: There were 10000000 observations read from the data set WORK.TEST.
NOTE: The data set UWORK.TEST has 10000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 12.60 seconds
cpu time 1.66 seconds
In general in 'real world' situations, PROC COPY is faster than a data step, but both are faster than PROC UPLOAD - unless you need to use proc upload because of complexities in your situation (I have never seen a reason to, but I know it is possible). I think that PROC UPLOAD was more necessary in older versions of SAS but is largely unneeded now, but given my experience is fairly limited in terms of hardware setups this may not apply to your situation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With