Suppose I have a data.table
in "melted" form where I have a key, and identifier and a value
library(data.table)
library(reshape2)
DT = data.table(X = c(1:5, 1:4), Y = c(rep("A", 5), rep("B", 4)), Z = rnorm(9))
DT2 = data.table(dcast(DT, X~Y))
How can I perform that sort of self join inside data.table
?
> DT
X Y Z
1: 1 A -0.19790449
2: 2 A 0.17906116
3: 3 A 0.01821837
4: 4 A 0.17309716
5: 5 A 0.05962474
6: 1 B -0.24629468
7: 2 B 0.92285734
8: 3 B 0.66002573
9: 4 B -1.01403880
> DT2
X A B
1: 1 -0.19790449 -0.2462947
2: 2 0.17906116 0.9228573
3: 3 0.01821837 0.6600257
4: 4 0.17309716 -1.0140388
5: 5 0.05962474 NA
Aside (mostly for Arun): Here is a solution I already use for melt (was written with help from Matthew D, so he should have this code), that I think replicates melt completely, and is pretty efficient. Dcast on the other hand (or should that be dtcast?) is much harder!
melt.data.table = function(data, id.vars, measure.vars,
variable.name = "variable",
..., na.rm = FALSE, value.name = "value") {
if(missing(id.vars)){
id.vars = setdiff(names(data), measure.vars)
}
if(missing(measure.vars)){
measure.vars = setdiff(names(data), id.vars)
}
dtlist = lapply(measure.vars, function(..colname) {
data[, c(id.vars, ..colname), with = FALSE][, (variable.name) := ..colname]
})
dt = rbindlist(dtlist)
setnames(dt, measure.vars[1], value.name)
if(na.rm){
return(na.omit(dt))
} else {
return(dt)
}
}
Update: faster versions of melt
and dcast
are now implemented (in C) in data.table
versions >= 1.9.0
. Check this post for more info.
Now you can just do:
dcast.data.table(DT, X~Y)
In case of dcast
alone, at the moment, it has to be written out completely (as it's not a S3 generic yet in reshape2
). We'll try to fix this as soon as possible. For melt,
you can just use melt(.)
as you'd do normally.
The general idea is this:
setkey(DT, X, Y)
DT[CJ(1:5, c("A", "B"))][, as.list(Z), by=X]
You can name the columns V1
and V2
as A
and B
using setnames
.
But this may not be efficient on large data or when the cast formula is complex. Or rather I should say, it could be much more efficient. We're in the process of finding such an implementation to integrate melt and cast on to data.table. Until then, you could get around this as above.
I'll update this post once we've made significant progress with melt/cast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With