I have a huge data frame with the following columns (and some more, but these are not important). Here's an example:
user_id video_id group_id x y
1 1 0 0 39 108
2 1 0 0 39 108
3 1 10 0 135 180
4 2 0 0 20 123
User, video and group IDs are factors, of course. For example, there are 20 videos, but each of them has several "observations" for each user and group.
I'd like to transform this data frame into the following format, where there are as many x.N
, y.N
as there are users (N
).
video_id x.1 y.1 x.2 y.2 …
0 39 108 20 123
So, for video 0
, the x and y values from user 1 are in columns x.1
and y.1
, respectively. For user 2, their values are in columns x.2
, y.2
, and so on.
I made myself a list of data frames that are solely composed of all the x, y
observations for each video_id
:
summaryList = dlply(allData, .(user_id), function(x) unique(x[c("video_id","x","y")]) )
That's how it looks like:
List of 15
$ 1 :'data.frame': 20 obs. of 3 variables:
..$ video_id: Factor w/ 20 levels "0","1","2","3",..: 1 11 8 5 12 9 20 13 7 10 ...
..$ x : int [1:20] 39 135 86 122 28 167 203 433 549 490 ...
..$ y : int [1:20] 108 180 164 103 187 128 185 355 360 368 ...
$ 2 :'data.frame': 20 obs. of 3 variables:
..$ video_id: Factor w/ 20 levels "0","1","2","3",..: 2 14 15 4 20 6 19 3 13 18 ...
..$ x : int [1:20] 128 688 435 218 528 362 299 134 83 417 ...
..$ y : int [1:20] 165 117 135 179 96 328 332 563 623 476 ...
What's left to do is:
Merge each data frame from the summaryList
with each other, based on the video_id
. I can't find a nice way to access the actual data frames in the list, which are summaryList[1]$`1`
, summaryList[2]$`2`
, et cetera.
@James found out a partial solution:
Reduce(function(x,y) merge(x,y,by="video_id"),summaryList)
Ensure the column names are renamed after the user ID and not kept as-is. Right now my summaryList
doesn't contain any info about the user ID, and the output of Reduce
has duplicate column names like x.x y.x x.y y.y x.x y.x
and so on.
How do I go about doing this? Or is there any easier way to get to the result than what I'm currently doing?
To join a list of DataFrames, say dfs , use the pandas. concat(dfs) function that merges an arbitrary number of DataFrames to a single one.
To combine data frames stored in a list in R, we can use full_join function of dplyr package inside Reduce function.
concat() to merge a list of DataFrames into a single DataFrame. Call pandas. concat(df_list) with df_list as a list of pandas. DataFrame s with the same column labels to merge the DataFrame s into a single DataFrame .
Key PointsPandas' merge and concat can be used to combine subsets of a DataFrame, or even data from different files. join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.
I am still somewhat confused. However, I guess you simply want to melt
and dcast
.
library(reshape2)
d <- melt(allData,id.vars=c("user_id","video_id"), measure.vars=c("x","y"))
dcast(d,video_id~user_id+variable,value.var="value",fun.aggregate=mean)
Resulting in:
video_id 1_x 1_y 2_x 2_y 3_x 3_y 4_x 4_y 5_x 5_y 6_x 6_y 7_x 7_y 8_x 8_y 9_x 9_y 10_x 10_y 11_x 11_y 12_x 12_y 14_x 14_y 15_x 15_y 16_x 16_y
1 0 39 108 899 132 61 357 149 298 1105 415 148 208 442 200 210 134 58 244 910 403 152 52 1092 617 1012 114 1105 424 548 394
2 1 1125 70 128 165 1151 390 171 587 623 623 80 643 866 310 994 114 854 129 781 306 672 -1 1096 354 525 524 150
Reduce
does the trick:
reducedData <- Reduce(function(x,y) merge(x,y,by="video_id"),summaryList)
… but you need to fix the names
afterwards:
names(reducedData)[-1] <- do.call(function(...) paste(...,sep="."),expand.grid(letters[24:25],names(summaryList)))
The result is:
video_id x.1 y.1 x.2 y.2 x.3 y.3 x.4 y.4 x.5 y.5 x.6 y.6 x.7 y.7 x.8
1 0 39 108 899 132 61 357 149 298 1105 415 148 208 442 200 210
2 1 1125 70 128 165 1151 390 171 587 623 623 80 643 866 310 994
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With