I am in the midst of re-writing a SAS program I normally use into R. Many statements in the SAS program are written with proc sql, so for ease of programming I am replicating those statements with sqldf in R. All of the sql code works the same with sqldf as it does in SAS, except for this statement:
SAS:
proc sql;
create table merged32 as
select max(ctf) as ctf,
max(ctms) as ctms,
max(site_name) as site_name,
provider_id, npi,
first_name, middle_name, last_name, specialty,
address_1, city, state, site_ct, zip, site_key
from merged2
group by 9,10,11,12,14,15;
quit;
run;
sqldf:
sqldf("select max(ctf) as ctf,
max(ctms) as ctms,
max(site_name) as site_name,
provider_id, npi,
first_name, middle_name, last_name, specialty,
address_1, city, state, site_ct, zip, site_key
from merged2
group by 9,10,11,12,14,15")
In SAS, it returns a dataset with 1878 rows; in sqldf, a dataframe with 1375.
Are there any major differences between proc sql and sqldf that could be causing this, or in general that need to be considered? I didn't provide data because 1) the datasets are huge, and 2) I'm more just interested in knowing the differences between the two sql systems.
Not sure about sqldf, but for meaningful results, you should group by all your key variables in SQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With