Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R merge three dataframe without forming the cartesian product

Tags:

merge

dataframe

r

I have the following dataframes a,b,c

Year<-rep(c("2002","2003"),1)
Crop<-c("TTT","RRR")
a<-data.frame(Year,Crop)

Year<-rep(c("2002","2003"),2)
ProductB<-c("A","A","B","B")
b<-data.frame(Year,ProductB)

Year<-rep(c("2002","2003"),3)
Location<-c("XX","XX","YY","YY","ZZ","ZZ")
c<-data.frame(Year,Location)

and want to get them together. When I use the merge function i get the cartesian product which is not what I want.

d<-merge(a,b,by="Year")
e<-merge(d,c,by="Year")

I would like the dataframe to look like

Year   Crop    ProductB    Location
 2002  TTT      A              XX
 2002   NA      B              YY
 2002   NA      NA             ZZ
 2003  RRR      A              XX 
 2003   NA      B              YY
 2003   NA      NA             ZZ

Is this possible? Thanks for your help

like image 536
user2386786 Avatar asked Mar 20 '23 12:03

user2386786


1 Answers

Here's one way using data.table.

require(data.table) ## 1.9.2
# (1)
setDT(a)[, GRP := 1:.N, by=Year]
setDT(b)[, GRP := 1:.N, by=Year]
setDT(c)[, GRP := 1:.N, by=Year]
# (2)
merge(a, merge(b, c, by=c("Year", "GRP"), 
          all=TRUE), by=c("Year", "GRP"), all=TRUE)

#    Year GRP Crop ProductB Location
# 1: 2002   1  TTT        A       XX
# 2: 2002   2   NA        B       YY
# 3: 2002   3   NA       NA       ZZ
# 4: 2003   1  RRR        A       XX
# 5: 2003   2   NA        B       YY
# 6: 2003   3   NA       NA       ZZ
  • (1) - setDT converts the data.frame to data.table and then we create a new column GRP by grouping by Year. With this, we've a unique combination of Year, Grp.
  • (2) - we merge on the two columns Year, GRP.

.N is an inbuilt variable that holds the number of rows for that group.

like image 71
Arun Avatar answered Apr 25 '23 17:04

Arun