Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between PROC SQL and sqldf

Tags:

sql

r

sqldf

sas

I am in the midst of re-writing a SAS program I normally use into R. Many statements in the SAS program are written with proc sql, so for ease of programming I am replicating those statements with sqldf in R. All of the sql code works the same with sqldf as it does in SAS, except for this statement:

SAS:

proc sql;
create table merged32 as
select max(ctf) as ctf,
       max(ctms) as ctms,
       max(site_name) as site_name,
       provider_id, npi,
       first_name, middle_name, last_name, specialty,
       address_1, city, state, site_ct, zip, site_key
    from merged2
    group by 9,10,11,12,14,15;
quit;
run;

sqldf:

sqldf("select max(ctf) as ctf,
              max(ctms) as ctms,
              max(site_name) as site_name,
              provider_id, npi,
              first_name, middle_name, last_name, specialty,
              address_1, city, state, site_ct, zip, site_key
       from merged2
       group by 9,10,11,12,14,15")

In SAS, it returns a dataset with 1878 rows; in sqldf, a dataframe with 1375.

Are there any major differences between proc sql and sqldf that could be causing this, or in general that need to be considered? I didn't provide data because 1) the datasets are huge, and 2) I'm more just interested in knowing the differences between the two sql systems.

like image 788
user1445246 Avatar asked Nov 04 '22 19:11

user1445246


1 Answers

Not sure about sqldf, but for meaningful results, you should group by all your key variables in SQL.

like image 56
stevepastelan Avatar answered Nov 07 '22 21:11

stevepastelan