In R, I have df1, df2, and df3 that represent lightning storms. Each df has two columns, 'city' and 'injuries'.
df1 = data.frame(city=c("atlanta", "new york"), injuries=c(5,8))
df2 = data.frame(city=c("chicago", "new york"), injuries=c(2,3))
df3 = data.frame(city=c("los angeles", "atlanta"), injuries=c(1,7))
I want to merge all 3 data frames on a type of outer join on the city column so that all the cities will show up in the combined dataframe and the injury counts will be summed like this:
combined.df
city df1.freq df2.freq df3.freq
atlanta 5 0 7
new york 8 3 0
chicago 0 2 0
los angeles 0 0 1
This is general to any number of data.frames:
library(functional)
Reduce(Curry(merge, by = "city", all = TRUE), list(df1, df2, df3))
# city injuries.x injuries.y injuries
# 1 atlanta 5 NA 7
# 2 new york 8 3 NA
# 3 chicago NA 2 NA
# 4 los angeles NA NA 1
However, multiple merges might be slow. Another approach would be to stack your data.frames into a long one:
df.long <- do.call(rbind, Map(transform, list(df1, df2, df3),
name = c("df1", "df2", "df3")))
# city injuries name
# 1 atlanta 5 df1
# 2 new york 8 df1
# 3 chicago 2 df2
# 4 new york 3 df2
# 5 los angeles 1 df3
# 6 atlanta 7 df3
Then reshape that data using xtabs
for example:
xtabs(injuries ~ city + name, df.long)
# name
# city df1 df2 df3
# atlanta 5 0 7
# new york 8 3 0
# chicago 0 2 0
# los angeles 0 0 1
(The reshape
function might also be useful for that last step, but I am not super familiar with it.)
merge
is your friend. Type ?merge
for more details.
> merge(merge(df1, df2, by = "city", all = TRUE), df3, by = "city", all = TRUE)
city injuries.x injuries.y injuries
1 atlanta 5 NA 7
2 chicago NA 2 NA
3 los angeles NA NA 1
4 new york 8 3 NA
Edit. While I like @flodel's solution, here is a more straightforward one, which may be somewhat easier to understand:
Reduce(function(d1, d2) merge(d1, d2, all = TRUE, by = "city"), list(df1, df2, df3))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With