I have three independent data.frames
. The three data.frames
have the same number of columns and the same number of rows. Additionally They have the same column names. I' m trying to merge the three data.frames according to column names. I'm using the following code wrote to merge two data.frames and return the number of matches.
Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, by=n)))
The problem is that while in this example there are two data.frames, in my case I have 3 data.frames. How can I modify the code to merge three data.frames instead of two? I tried to modify the string in this way simply adding the third data.frame but it does not work:
Merged_DF = sapply(names(DF1),function(n) nrow(merge(DF1, DF2, DF3, by=n)))
It returns the following error:
Error in fix.by(by.x, x) : 'by' must specify column(s) as numbers, names or logical
Ex:
DF1
G1 G2 G3 a b f b c a c d b
DF2
G1 G2 G3 A b f b c a h M b
DF3
G1 G2 G3 a b f b l a j M v
The data.frames have around 250 rows and 50 cols.
We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .
Join Multiple R DataFrames To join more than two (multiple) R dataframes, then reduce() is used. It is available in the tidyverse package which will convert all the dataframes to a list and join the dataframes based on the column.
The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.
You can use the Reduce
function to merge multiple data frames:
df_list <- list(DF1, DF2, DF3)
Reduce(function(x, y) merge(x, y, all=TRUE), df_list, accumulate=FALSE)
Or merge_recurse
from the reshape
package:
library(reshape)
data <- merge_recurse(df_list)
See also the R Wiki: Merge data frames
After researching this very same question for a couple hours today, I came up with this simple but elegant solution using a combination of 'dplyr' pipes and the base R 'merge()' function.
MergedDF <- merge(DF1, DF2) %>%
merge(DF3)
As you mention in your post, this assumes that the column names are the same and that there's the same number of rows in each data frame you are merging. This will also automatically eliminate any duplicate columns (i.e., identifiers) that were used in the merging process.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With