Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Merging a data frame from a list of data frames [duplicate]




I have a list of data frame that looks like this:


 month year   oracle
    1 2004 356.0000
    2 2004 390.0000
    3 2004 394.4286
    4 2004 391.8571 
 month year microsoft
    1 2004  339.0000
    2 2004  357.7143
    3 2004  347.1429
    4 2004  333.2857

How do I create a single data frame that looks like this:

 month year   oracle   microsoft
    1 2004 356.0000    339.0000
    2 2004 390.0000    357.7143
    3 2004 394.4286    347.1429
    4 2004 391.8571    333.2857
like image 772
anonymous Avatar asked Oct 20 '15 05:10


2 Answers

We could also use Reduce

Reduce(function(...) merge(..., by = c('month', 'year')), lst)

Using @Jaap's example, if the values are not the same, use all=TRUE option from merge.

Reduce(function(...) merge(..., by = c('month', 'year'), all=TRUE), ls)
#     month year   oracle microsoft   google
#1     1 2004 356.0000        NA       NA
#2     2 2004 390.0000  339.0000       NA
#3     3 2004 394.4286  357.7143 390.0000
#4     4 2004 391.8571  347.1429 391.8571
#5     5 2004       NA  333.2857 357.7143
#6     6 2004       NA        NA 333.2857
like image 130
akrun Avatar answered Oct 13 '22 00:10


Using the Reduce/merge code from @akrun's answer will work great if the values for the month and year columns are the same for each dataframe. However, when they are not the same (example data at the end of this answer)

Reduce(function(...) merge(..., by = c('month', 'year')), ls)

will return only the rows which are common in each dataframe:

  month year   oracle microsoft   google
1     3 2004 394.4286  357.7143 390.0000
2     4 2004 391.8571  347.1429 391.8571

In that case, you can either use all=TRUE (as shown by @akrun) or use full_join from the dplyr package as an alternative when you want to include all rows/observations:

Reduce(function(...) full_join(..., by = c('month', 'year')), ls) 
# or just:
Reduce(full_join, ls)

this will result in:

  month year   oracle microsoft   google
1     1 2004 356.0000        NA       NA
2     2 2004 390.0000  339.0000       NA
3     3 2004 394.4286  357.7143 390.0000
4     4 2004 391.8571  347.1429 391.8571
5     5 2004       NA  333.2857 357.7143
6     6 2004       NA        NA 333.2857

Used data:

ls <- list(structure(list(month = 1:4, year = c(2004L, 2004L, 2004L, 2004L), oracle = c(356, 390, 394.4286, 391.8571)), .Names = c("month", "year", "oracle"), class = "data.frame", row.names = c(NA, -4L)), 
           structure(list(month = 2:5, year = c(2004L, 2004L, 2004L, 2004L), microsoft = c(339, 357.7143, 347.1429, 333.2857)), .Names = c("month", "year", "microsoft"), class = "data.frame", row.names = c(NA,-4L)),
           structure(list(month = 3:6, year = c(2004L, 2004L, 2004L, 2004L), google = c(390, 391.8571, 357.7143, 333.2857)), .Names = c("month", "year", "google"), class = "data.frame", row.names = c(NA,-4L)))
like image 21
Jaap Avatar answered Oct 13 '22 01:10
