Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge two data frames on common columns in R with sum of others?

Tags:

merge

r

plyr

R Version 2.11.1 32-bit on Windows 7

I got two data sets: data_A and data_B:

data_A

USER_A USER_B ACTION 1      11     0.3 1      13     0.25 1      16     0.63 1      17     0.26 2      11     0.14 2      14     0.28 

data_B

USER_A USER_B ACTION 1      13     0.17 1      14     0.27 2      11     0.25 

Now I want to add the ACTION of data_B to the data_A if their USER_A and USER_B are equal. As the example above, the result would be:

data_A

USER_A USER_B ACTION 1      11     0.3 1      13     0.25+0.17 1      16     0.63 1      17     0.26 2      11     0.14+0.25 2      14     0.28 

So how could I achieve it?

like image 541
PepsiCo Avatar asked Apr 24 '11 07:04

PepsiCo


People also ask

How do I combine two data frames in R based on column?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

How do I combine two columns of datasets in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

How do I merge two data frames in R?

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS.

How do I combine multiple columns into one in R?

How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.


1 Answers

You can use ddply in package plyr and combine it with merge:

library(plyr) ddply(merge(data_A, data_B, all.x=TRUE),    .(USER_A, USER_B), summarise, ACTION=sum(ACTION)) 

Notice that merge is called with the parameter all.x=TRUE - this returns all of the values in the first data.frame passed to merge, i.e. data_A:

  USER_A USER_B ACTION 1      1     11   0.30 2      1     13   0.25 3      1     16   0.63 4      1     17   0.26 5      2     11   0.14 6      2     14   0.28 
like image 195
Andrie Avatar answered Oct 10 '22 09:10

Andrie