Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge 3 data.frames by column names

Tags:

dataframe

r

I have three independent data.frames. The three data.frames have the same number of columns and the same number of rows. Additionally They have the same column names. I' m trying to merge the three data.frames according to column names. I'm using the following code wrote to merge two data.frames and return the number of matches.

Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, by=n)))

The problem is that while in this example there are two data.frames, in my case I have 3 data.frames. How can I modify the code to merge three data.frames instead of two? I tried to modify the string in this way simply adding the third data.frame but it does not work:

  Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, DF3,  by=n)))

It returns the following error:

 Error in fix.by(by.x, x) :  'by' must specify column(s) as numbers, names or logical

Ex:

DF1

 G1  G2  G3
  a   b   f
  b   c   a
  c   d   b

DF2

 G1  G2  G3
  A   b   f
  b   c   a
  h   M   b

DF3

 G1  G2  G3
  a   b   f
  b   l   a
  j   M   v

The data.frames have around 250 rows and 50 cols.

like image 323
Elb Avatar asked Mar 08 '13 10:03

Elb


People also ask

How do I merge 3 data frames?

We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .

Can you merge 3 Dataframes in R?

Join Multiple R DataFrames To join more than two (multiple) R dataframes, then reduce() is used. It is available in the tidyverse package which will convert all the dataframes to a list and join the dataframes based on the column.

How do I merge Dataframes based on columns in R?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.


2 Answers

You can use the Reduce function to merge multiple data frames:

df_list <- list(DF1, DF2, DF3)
Reduce(function(x, y) merge(x, y, all=TRUE), df_list, accumulate=FALSE)

Or merge_recurse from the reshape package:

library(reshape)
data <- merge_recurse(df_list)

See also the R Wiki: Merge data frames

like image 111
rcs Avatar answered Oct 11 '22 03:10

rcs


After researching this very same question for a couple hours today, I came up with this simple but elegant solution using a combination of 'dplyr' pipes and the base R 'merge()' function.

MergedDF <- merge(DF1, DF2) %>%
              merge(DF3)

As you mention in your post, this assumes that the column names are the same and that there's the same number of rows in each data frame you are merging. This will also automatically eliminate any duplicate columns (i.e., identifiers) that were used in the merging process.

like image 37
Paul Sochacki Avatar answered Oct 11 '22 02:10

Paul Sochacki