Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge data frames from a list with each other

What I need:

I have a huge data frame with the following columns (and some more, but these are not important). Here's an example:

    user_id video_id group_id    x   y
1         1        0        0   39 108
2         1        0        0   39 108
3         1       10        0  135 180
4         2        0        0   20 123

User, video and group IDs are factors, of course. For example, there are 20 videos, but each of them has several "observations" for each user and group.

I'd like to transform this data frame into the following format, where there are as many x.N, y.N as there are users (N).

video_id  x.1   y.1  x.2  y.2  …
       0   39   108   20  123

So, for video 0, the x and y values from user 1 are in columns x.1 and y.1, respectively. For user 2, their values are in columns x.2, y.2, and so on.

What I've tried:

I made myself a list of data frames that are solely composed of all the x, y observations for each video_id:

summaryList = dlply(allData, .(user_id), function(x) unique(x[c("video_id","x","y")]) )

That's how it looks like:

List of 15
 $ 1 :'data.frame': 20 obs. of  3 variables:
  ..$ video_id: Factor w/ 20 levels "0","1","2","3",..: 1 11 8 5 12 9 20 13 7 10 ...
  ..$ x       : int [1:20] 39 135 86 122 28 167 203 433 549 490 ...
  ..$ y       : int [1:20] 108 180 164 103 187 128 185 355 360 368 ...
 $ 2 :'data.frame': 20 obs. of  3 variables:
  ..$ video_id: Factor w/ 20 levels "0","1","2","3",..: 2 14 15 4 20 6 19 3 13 18 ...
  ..$ x       : int [1:20] 128 688 435 218 528 362 299 134 83 417 ...
  ..$ y       : int [1:20] 165 117 135 179 96 328 332 563 623 476 ...

Where I'm stuck:

What's left to do is:

  • Merge each data frame from the summaryList with each other, based on the video_id. I can't find a nice way to access the actual data frames in the list, which are summaryList[1]$`1`, summaryList[2]$`2`, et cetera.

    @James found out a partial solution:

    Reduce(function(x,y) merge(x,y,by="video_id"),summaryList)
    
  • Ensure the column names are renamed after the user ID and not kept as-is. Right now my summaryList doesn't contain any info about the user ID, and the output of Reduce has duplicate column names like x.x y.x x.y y.y x.x y.x and so on.

How do I go about doing this? Or is there any easier way to get to the result than what I'm currently doing?

like image 648
slhck Avatar asked Dec 19 '12 13:12

slhck


People also ask

How do I merge a list of data frames?

To join a list of DataFrames, say dfs , use the pandas. concat(dfs) function that merges an arbitrary number of DataFrames to a single one.

How do I combine data frames in a list in R?

To combine data frames stored in a list in R, we can use full_join function of dplyr package inside Reduce function.

How do I merge a list of DataFrames in Python?

concat() to merge a list of DataFrames into a single DataFrame. Call pandas. concat(df_list) with df_list as a list of pandas. DataFrame s with the same column labels to merge the DataFrame s into a single DataFrame .

Can you combine data frames?

Key PointsPandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.


2 Answers

I am still somewhat confused. However, I guess you simply want to melt and dcast.

library(reshape2)
d <- melt(allData,id.vars=c("user_id","video_id"), measure.vars=c("x","y"))
dcast(d,video_id~user_id+variable,value.var="value",fun.aggregate=mean)

Resulting in:

 video_id  1_x 1_y  2_x 2_y  3_x 3_y  4_x 4_y  5_x 5_y  6_x 6_y  7_x 7_y  8_x 8_y  9_x 9_y 10_x 10_y 11_x 11_y 12_x 12_y 14_x 14_y 15_x 15_y 16_x 16_y
1         0   39 108  899 132   61 357  149 298 1105 415  148 208  442 200  210 134   58 244  910  403  152   52 1092  617 1012  114 1105  424  548  394
2         1 1125  70  128 165 1151 390  171 587  623 623   80 643  866 310  994 114  854 129  781  306  672   -1 1096  354  525  524  150 
like image 74
Roland Avatar answered Nov 15 '22 08:11

Roland


Reduce does the trick:

reducedData <- Reduce(function(x,y) merge(x,y,by="video_id"),summaryList)

… but you need to fix the names afterwards:

names(reducedData)[-1] <- do.call(function(...) paste(...,sep="."),expand.grid(letters[24:25],names(summaryList)))

The result is:

   video_id  x.1 y.1  x.2 y.2  x.3 y.3  x.4 y.4  x.5 y.5  x.6 y.6  x.7 y.7  x.8
1         0   39 108  899 132   61 357  149 298 1105 415  148 208  442 200  210
2         1 1125  70  128 165 1151 390  171 587  623 623   80 643  866 310  994
like image 27
James Avatar answered Nov 15 '22 07:11

James