Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I merge multiple dataframes with the same column names?

What I have:

I have a "master" dataframe that has the following columns:

userid, condition

Since there are four experiment conditions, I also have four dataframes that carry answer information, with the following columns:

userid, condition, answer1, answer2

Now, I'd like to join these, so all combinations of user IDs, conditions and their answers to these conditions are merged. Each condition should only have the correct answer in the appropriate column, per row.


Short, self-contained example:

master = data.frame(userid=c("foo","foo","foo","foo","bar","bar","bar","bar"), condition=c("A","B","C","D","A","B","C","D"))
cond_a = data.frame(userid=c("foo","bar"), condition="A", answer1=c("1","1"), answer2=c("2","2"))
cond_b = data.frame(userid=c("foo","bar"), condition="B", answer1=c("3","3"), answer2=c("4","4"))
cond_c = data.frame(userid=c("foo","bar"), condition="C", answer1=c("5","5"), answer2=c("6","6"))
cond_d = data.frame(userid=c("foo","bar"), condition="D", answer1=c("7","7"), answer2=c("8","8"))

How do I merge all conditions into the master, so the master table looks like follows?

  userid condition answer1 answer2
1    bar         A       1       2
2    bar         B       3       4
3    bar         C       5       6
4    bar         D       7       8
5    foo         A       1       2
6    foo         B       3       4
7    foo         C       5       6
8    foo         D       7       8

I've tried the following:

temp = merge(master, cond_a, all.x=TRUE)

Which gives me:

  userid condition answer1 answer2
1    bar         A       1       2
2    bar         B    <NA>    <NA>
3    bar         C    <NA>    <NA>
4    bar         D    <NA>    <NA>
5    foo         A       1       2
6    foo         B    <NA>    <NA>
7    foo         C    <NA>    <NA>
8    foo         D    <NA>    <NA>

But as soon as I do this…

merge(temp, cond_b, all.x=TRUE)

There are no values for condition B. How come?

  userid condition answer1 answer2
1    bar         A       1       2
2    bar         B    <NA>    <NA>
3    bar         C    <NA>    <NA>
4    bar         D    <NA>    <NA>
5    foo         A       1       2
6    foo         B    <NA>    <NA>
7    foo         C    <NA>    <NA>
8    foo         D    <NA>    <NA>
like image 864
slhck Avatar asked Dec 08 '12 14:12

slhck


People also ask

How do you merge DataFrames based on column names?

To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.

How do I merge multiple DataFrames in a column?

Pandas merge() function is used to merge multiple Dataframes. We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes.

How do I merge two DataFrames with the same column names in R?

To combine two data frames with same columns in R language, call rbind() function, and pass the two data frames, as arguments. rbind() function returns the resulting data frame created from concatenating the given two data frames. For rbind() function to combine the given data frames, the column names must match.

Which are the 3 main ways of combining DataFrames together?

Combine data from multiple files into a single DataFrame using merge and concat. Combine two DataFrames using a unique ID found in both DataFrames. Employ to_csv to export a DataFrame in CSV format. Join DataFrames using common fields (join keys).


1 Answers

You can use Reduce() and complete.cases() as follows:

merged <- Reduce(function(x, y) merge(x, y, all=TRUE), 
                 list(master, cond_a, cond_b, cond_c, cond_d))
merged[complete.cases(merged), ]
#    userid condition answer1 answer2
# 1     bar         A       1       2
# 2     bar         B       3       4
# 4     bar         C       5       6
# 6     bar         D       7       8
# 8     foo         A       1       2
# 9     foo         B       3       4
# 11    foo         C       5       6
# 13    foo         D       7       8

Reduce() might take some getting accustomed to. You define your function, and then provide a list of objects to repeatedly apply the function to. Thus, that statement is like doing:

temp1 <- merge(master, cond_a, all=TRUE)
temp2 <- merge(temp1, cond_b, all=TRUE)
temp3 <- merge(temp2, ....)

Or something like:

merge(merge(merge(master, cond_a, all=TRUE), cond_b, all=TRUE), cond_c, all=TRUE)

complete.cases() creates a logical vector of whether the specified columns are "complete" or not; this logical vector can be used to subset from the merged data.frame.

like image 152
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 27 '22 21:09

A5C1D2H2I1M1N2O1R2T1