Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

match two data.frames based on multiple columns

Tags:

dataframe

r

match

My head stands still at the moment. I would like to match/extract data from a larger data.frame (df) based on the columns in a smaller data.frame (mdf). What I'm getting stuck on is the fact that I want to match multiple columns (two in this case). I have tried different approaches using e.g. merge, which, match %in% but non have succeeded.

# Dummy example

# Large df
df <- mtcars[1:6,1:3]
df$car_1 <- rownames(df)
df$car_2 <- rownames(tail(mtcars))

# df to match
mdf <- df[c("car_1","car_2")][3:6,]

rownames(df) <- NULL
rownames(mdf) <- NULL

The desired output would look something like

 mpg cyl disp             car_1          car_2
22.8   4  108        Datsun 710 Ford Pantera L
21.4   6  258    Hornet 4 Drive   Ferrari Dino  
18.7   8  360 Hornet Sportabout  Maserati Bora
18.1   6  225           Valiant     Volvo 142E

This feels like it should be very straight forward.

Any pointer would be highly appreciated, thanks!

like image 931
jO. Avatar asked Oct 27 '14 20:10

jO.


People also ask

How do I join two DataFrames based on two columns?

We can join columns from two Dataframes using the merge() function. This is similar to the SQL 'join' functionality. A detailed discussion of different join types is given in the SQL lesson. You specify the type of join you want using the how parameter.

How do I join two DataFrames based on columns in R?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

Can you join by multiple columns in R?

Using base merge() to Join Multiple ColumnsUsing merge() function from the R base can also be used to perform joining on multiple columns of data frame. To do so you need to create a vector for by. x with the columns you wanted to join on and create a similar vector for by. y .


1 Answers

How about merge(df, mdf, all.x = FALSE, all.y = TRUE)?

Edit: If you have different column names you can specify which ones to merge on, e.g.:

names(mdf) <- c("car_3", "car_4")
merge(df, mdf, by.x = c("car_1", "car_2"), by.y = c("car_3", "car_4"), 
      all.x = FALSE, all.y = TRUE)
like image 176
Kara Woo Avatar answered Nov 03 '22 02:11

Kara Woo