Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine rows that have same value in a variable in R [duplicate]

Tags:

r

I created the following data frame in R:

V1 <- c(1,3,2,6,7,7,5,3,1,1)
V2 <- c("rot", "grün", "grün", "gelb", "blau", "rot", "grün", "blau",    
"blau", "schwarz")
V3 <- c(44,23,28,23,88,88,44,28,11,44)
as.data.frame(cbind(V1,V2,V3) )

   V1      V2 V3
1   1     rot 44
2   3    grün 23
3   2    grün 28
4   6    gelb 23
5   7    blau 88
6   7     rot 88
7   5    grün 44
8   3    blau 28
9   1    blau 11
10  1 schwarz 44

V3 is the variable I want to use to rearrange the data set. The result should be a data frame that contains a row for each value of V3, and the information on the other variables in the same row.

For this example, what I want is something like that:

V3  V1.1  V2.1  V2.1  V2.2  V1.3  V2.3

11  1     blau   NA    NA    NA    NA
23  3     grün    6   gelb   NA    NA
28  2     grün    3   blau   NA    NA
44  1     rot     5   grün    1   schwarz
88  7     blau    7   rot    NA    NA

Is there a function that can do that? Thanks for your help!!!!

like image 771
Sandy Avatar asked Feb 10 '23 08:02

Sandy


2 Answers

reshape(transform(df,time=ave(seq_len(nrow(df)),V3,FUN=seq_along)),dir='w',idvar='V3');
##   V3 V1.1 V2.1 V1.2 V2.2 V1.3    V2.3
## 1 44    1  rot    5 grün    1 schwarz
## 2 23    3 grün    6 gelb <NA>    <NA>
## 3 28    2 grün    3 blau <NA>    <NA>
## 5 88    7 blau    7  rot <NA>    <NA>
## 9 11    1 blau <NA> <NA> <NA>    <NA>
like image 127
bgoldst Avatar answered Feb 11 '23 22:02

bgoldst


Here is one option using dcast from the devel version of data.table.

We convert the data.frame to data.table (setDT(df1)). Create a sequence column 'indx' based on the grouping variable 'V3', and dcast from 'long' to 'wide'. In the devel version, dcast can take multiple value.var columns.

library(data.table)#v1.9.5+
setDT(df1)[, indx:=1:.N, V3]#create sequence variable
dcast(df1, V3~indx, value.var=c('V1', 'V2'), sep=".")
#    V3 V1.1 V1.2 V1.3 V2.1 V2.2    V2.3
#1: 11    1   NA   NA blau   NA      NA
#2: 23    3    6   NA grün gelb      NA
#3: 28    2    3   NA grün blau      NA
#4: 44    1    5    1  rot grün schwarz
#5: 88    7    7   NA blau  rot      NA

NOTE: Instructions to install the devel version are here

This could be done in a more compact way using getanID from splitstackshape to create the sequence variable.

 library(splitstackshape)
 dcast(getanID(df1, 'V3'), V3~.id, value.var=c('V1', 'V2'))
 #   V3 V1_1 V1_2 V1_3 V2_1 V2_2    V2_3
 #1: 11    1   NA   NA blau   NA      NA
 #2: 23    3    6   NA grün gelb      NA
 #3: 28    2    3   NA grün blau      NA
 #4: 44    1    5    1  rot grün schwarz
 #5: 88    7    7   NA blau  rot      NA

data

 df1 <- data.frame(V1, V2, V3)
like image 20
akrun Avatar answered Feb 11 '23 21:02

akrun