I have a "master" dataframe that has the following columns:
userid, condition
Since there are four experiment conditions, I also have four dataframes that carry answer information, with the following columns:
userid, condition, answer1, answer2
Now, I'd like to join these, so all combinations of user IDs, conditions and their answers to these conditions are merged. Each condition should only have the correct answer in the appropriate column, per row.
master = data.frame(userid=c("foo","foo","foo","foo","bar","bar","bar","bar"), condition=c("A","B","C","D","A","B","C","D"))
cond_a = data.frame(userid=c("foo","bar"), condition="A", answer1=c("1","1"), answer2=c("2","2"))
cond_b = data.frame(userid=c("foo","bar"), condition="B", answer1=c("3","3"), answer2=c("4","4"))
cond_c = data.frame(userid=c("foo","bar"), condition="C", answer1=c("5","5"), answer2=c("6","6"))
cond_d = data.frame(userid=c("foo","bar"), condition="D", answer1=c("7","7"), answer2=c("8","8"))
How do I merge all conditions into the master, so the master table looks like follows?
userid condition answer1 answer2
1 bar A 1 2
2 bar B 3 4
3 bar C 5 6
4 bar D 7 8
5 foo A 1 2
6 foo B 3 4
7 foo C 5 6
8 foo D 7 8
I've tried the following:
temp = merge(master, cond_a, all.x=TRUE)
Which gives me:
userid condition answer1 answer2
1 bar A 1 2
2 bar B <NA> <NA>
3 bar C <NA> <NA>
4 bar D <NA> <NA>
5 foo A 1 2
6 foo B <NA> <NA>
7 foo C <NA> <NA>
8 foo D <NA> <NA>
But as soon as I do this…
merge(temp, cond_b, all.x=TRUE)
There are no values for condition B
. How come?
userid condition answer1 answer2
1 bar A 1 2
2 bar B <NA> <NA>
3 bar C <NA> <NA>
4 bar D <NA> <NA>
5 foo A 1 2
6 foo B <NA> <NA>
7 foo C <NA> <NA>
8 foo D <NA> <NA>
To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.
Pandas merge() function is used to merge multiple Dataframes. We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes.
To combine two data frames with same columns in R language, call rbind() function, and pass the two data frames, as arguments. rbind() function returns the resulting data frame created from concatenating the given two data frames. For rbind() function to combine the given data frames, the column names must match.
Combine data from multiple files into a single DataFrame using merge and concat. Combine two DataFrames using a unique ID found in both DataFrames. Employ to_csv to export a DataFrame in CSV format. Join DataFrames using common fields (join keys).
You can use Reduce()
and complete.cases()
as follows:
merged <- Reduce(function(x, y) merge(x, y, all=TRUE),
list(master, cond_a, cond_b, cond_c, cond_d))
merged[complete.cases(merged), ]
# userid condition answer1 answer2
# 1 bar A 1 2
# 2 bar B 3 4
# 4 bar C 5 6
# 6 bar D 7 8
# 8 foo A 1 2
# 9 foo B 3 4
# 11 foo C 5 6
# 13 foo D 7 8
Reduce()
might take some getting accustomed to. You define your function, and then provide a list
of objects to repeatedly apply the function to. Thus, that statement is like doing:
temp1 <- merge(master, cond_a, all=TRUE)
temp2 <- merge(temp1, cond_b, all=TRUE)
temp3 <- merge(temp2, ....)
Or something like:
merge(merge(merge(master, cond_a, all=TRUE), cond_b, all=TRUE), cond_c, all=TRUE)
complete.cases()
creates a logical vector of whether the specified columns are "complete" or not; this logical vector can be used to subset from the merged data.frame
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With