Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging rows with the same ID variable [duplicate]

I have a dataframe in R with 2186 obs of 38 vars. Rows have an ID variable referring to unique experiments and using

length(unique(df$ID))==nrow(df)

n_occur<-data.frame(table(df$ID))

I know 327 of my rows have repeated IDs with some IDs repeated more than once. I am trying to merge rows with the same ID as these aren't duplicates but just second, third etc. observations within a given experiment.

So for example if I had

x y ID
1 2 a
1 3 b
2 4 c
1 3 d
1 4 a
3 2 b
2 3 a

I would like to end up with

x y ID x2 y2 ID2 x3 y3 ID3
1 2 a  1  4  a   2  3  a
1 3 b  3  2  b  na na na
2 4 c  na na na na na na
1 3 d  na na na na na na

I've seen similar questions for SQL and php but this hasn't helped me with my attempts in R. Any help would be gratefully appreciated.

like image 587
suuz Avatar asked Jun 23 '26 07:06

suuz


1 Answers

You could use the enhanced dcast function from the data.table package for that where you can select multiple value variables. With setDT(mydf) you convert your dataframe to a datatable and with [, idx := 1:.N, by = ID] you add a index by ID which you use subsequently in the dcast formula:

library(data.table)
dcast(setDT(mydf)[, idx := 1:.N, by = ID], ID ~ idx, value.var = c("x","y"))

Or with the development version of data.table (v1.9.7+), you can use the new rowid function:

dcast(setDT(mydf), ID ~ rowid(ID), value.var = c("x","y"))

gives:

   ID x_1 x_2 x_3 y_1 y_2 y_3
1:  a   1   1   2   2   4   3
2:  b   1   3  NA   3   2  NA
3:  c   2  NA  NA   4  NA  NA
4:  d   1  NA  NA   3  NA  NA

Used data:

mydf <- structure(list(x = c(1L, 1L, 2L, 1L, 1L, 3L, 2L), y = c(2L, 3L, 
4L, 3L, 4L, 2L, 3L), ID = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 
1L), .Label = c("a", "b", "c", "d"), class = "factor")), .Names = c("x", 
"y", "ID"), class = "data.frame", row.names = c(NA, -7L))
like image 75
Jaap Avatar answered Jun 24 '26 21:06

Jaap