Difference between PROC SQL and sqldf

Question

I am in the midst of re-writing a SAS program I normally use into R. Many statements in the SAS program are written with proc sql, so for ease of programming I am replicating those statements with sqldf in R. All of the sql code works the same with sqldf as it does in SAS, except for this statement:

SAS:

proc sql;
create table merged32 as
select max(ctf) as ctf,
       max(ctms) as ctms,
       max(site_name) as site_name,
       provider_id, npi,
       first_name, middle_name, last_name, specialty,
       address_1, city, state, site_ct, zip, site_key
    from merged2
    group by 9,10,11,12,14,15;
quit;
run;

sqldf:

sqldf("select max(ctf) as ctf,
              max(ctms) as ctms,
              max(site_name) as site_name,
              provider_id, npi,
              first_name, middle_name, last_name, specialty,
              address_1, city, state, site_ct, zip, site_key
       from merged2
       group by 9,10,11,12,14,15")

In SAS, it returns a dataset with 1878 rows; in sqldf, a dataframe with 1375.

Are there any major differences between proc sql and sqldf that could be causing this, or in general that need to be considered? I didn't provide data because 1) the datasets are huge, and 2) I'm more just interested in knowing the differences between the two sql systems.

stevepastelan · Accepted Answer

Not sure about sqldf, but for meaningful results, you should group by all your key variables in SQL.

Difference between PROC SQL and sqldf

Tags:

sql

r

sqldf

sas

user1445246

1 Answers

stevepastelan

Recent Activity

Donate For Us

Difference between PROC SQL and sqldf

Tags:

sql

r

sqldf

sas

user1445246

1 Answers

stevepastelan

Related questions

Recent Activity

Donate For Us