Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge df1, df2, df3 dataframes in R on a specified column

In R, I have df1, df2, and df3 that represent lightning storms. Each df has two columns, 'city' and 'injuries'.

df1 = data.frame(city=c("atlanta", "new york"), injuries=c(5,8))
df2 = data.frame(city=c("chicago", "new york"), injuries=c(2,3))
df3 = data.frame(city=c("los angeles", "atlanta"), injuries=c(1,7))

I want to merge all 3 data frames on a type of outer join on the city column so that all the cities will show up in the combined dataframe and the injury counts will be summed like this:

combined.df

city         df1.freq   df2.freq   df3.freq
atlanta      5          0          7
new york     8          3          0
chicago      0          2          0
los angeles  0          0          1
like image 665
Andrew Heekin Avatar asked Feb 14 '23 11:02

Andrew Heekin


2 Answers

This is general to any number of data.frames:

library(functional)
Reduce(Curry(merge, by = "city", all = TRUE), list(df1, df2, df3))
#          city injuries.x injuries.y injuries
# 1     atlanta          5         NA        7
# 2    new york          8          3       NA
# 3     chicago         NA          2       NA
# 4 los angeles         NA         NA        1

However, multiple merges might be slow. Another approach would be to stack your data.frames into a long one:

df.long <- do.call(rbind, Map(transform, list(df1, df2, df3),
                                         name = c("df1", "df2", "df3")))
#          city injuries name
# 1     atlanta        5  df1
# 2    new york        8  df1
# 3     chicago        2  df2
# 4    new york        3  df2
# 5 los angeles        1  df3
# 6     atlanta        7  df3

Then reshape that data using xtabs for example:

xtabs(injuries ~ city + name, df.long)
#              name
# city          df1 df2 df3
#   atlanta       5   0   7
#   new york      8   3   0
#   chicago       0   2   0
#   los angeles   0   0   1

(The reshape function might also be useful for that last step, but I am not super familiar with it.)

like image 147
flodel Avatar answered Feb 17 '23 12:02

flodel


merge is your friend. Type ?merge for more details.

> merge(merge(df1, df2, by = "city", all = TRUE), df3, by = "city", all = TRUE)
         city injuries.x injuries.y injuries
1     atlanta          5         NA        7
2     chicago         NA          2       NA
3 los angeles         NA         NA        1
4    new york          8          3       NA

Edit. While I like @flodel's solution, here is a more straightforward one, which may be somewhat easier to understand:

 Reduce(function(d1, d2) merge(d1, d2, all = TRUE, by = "city"), list(df1, df2, df3))
like image 22
Victor K. Avatar answered Feb 17 '23 11:02

Victor K.