I have one data.table with 1M rows and 2 columns
Dummy data:
require(data.table)
ID <- c(1,2,3)
variable <- c("a,b","a,c","c,d")
dt <- data.table(ID,variable)
dt
> dt
ID variable 1 a,b 2 a,c 3 c,d
Now I want to collapse the column "variable" into different rows by "ID", just as the "melt" function in reshape2 or melt.data.table in data.table
Here's what I want:
ID variable 1 a 1 b 2 a 2 c 3 c 3 d
PS: Given the desired results, I know how to do the reverse step.
dt2 <- data.table(ID = c(1,1,2,2,3,3), variable = c("a","b","a","c","c","d"))
dt3 <- dt2[, list(variables = paste(variable, collapse = ",")), by = ID]
Any tips or suggestions?
Since strsplit
is vectorised, and that's going to be the time consuming operation here, I'd avoid using it on each group. Instead, one could first split on the ,
on the entire column and then reconstruct the data.table
as follows:
var = strsplit(dt$variable, ",", fixed=TRUE)
len = vapply(var, length, 0L)
ans = data.table(ID=rep(dt$ID, len), variable=unlist(var))
# ID variable
# 1: 1 a
# 2: 1 b
# 3: 2 a
# 4: 2 c
# 5: 3 c
# 6: 3 d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With