I have a dataframe in R with 2186 obs of 38 vars. Rows have an ID variable referring to unique experiments and using
length(unique(df$ID))==nrow(df)
n_occur<-data.frame(table(df$ID))
I know 327 of my rows have repeated IDs with some IDs repeated more than once. I am trying to merge rows with the same ID as these aren't duplicates but just second, third etc. observations within a given experiment.
So for example if I had
x y ID
1 2 a
1 3 b
2 4 c
1 3 d
1 4 a
3 2 b
2 3 a
I would like to end up with
x y ID x2 y2 ID2 x3 y3 ID3
1 2 a 1 4 a 2 3 a
1 3 b 3 2 b na na na
2 4 c na na na na na na
1 3 d na na na na na na
I've seen similar questions for SQL and php but this hasn't helped me with my attempts in R. Any help would be gratefully appreciated.
You could use the enhanced dcast function from the data.table package for that where you can select multiple value variables. With setDT(mydf) you convert your dataframe to a datatable and with [, idx := 1:.N, by = ID] you add a index by ID which you use subsequently in the dcast formula:
library(data.table)
dcast(setDT(mydf)[, idx := 1:.N, by = ID], ID ~ idx, value.var = c("x","y"))
Or with the development version of data.table (v1.9.7+), you can use the new rowid function:
dcast(setDT(mydf), ID ~ rowid(ID), value.var = c("x","y"))
gives:
ID x_1 x_2 x_3 y_1 y_2 y_3
1: a 1 1 2 2 4 3
2: b 1 3 NA 3 2 NA
3: c 2 NA NA 4 NA NA
4: d 1 NA NA 3 NA NA
Used data:
mydf <- structure(list(x = c(1L, 1L, 2L, 1L, 1L, 3L, 2L), y = c(2L, 3L,
4L, 3L, 4L, 2L, 3L), ID = structure(c(1L, 2L, 3L, 4L, 1L, 2L,
1L), .Label = c("a", "b", "c", "d"), class = "factor")), .Names = c("x",
"y", "ID"), class = "data.frame", row.names = c(NA, -7L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With