> library(data.table)
> A <- data.table(x = c(1,1,2,2), y = c(1,2,1,2), v = c(0.1,0.2,0.3,0.4))
> A
x y v
1: 1 1 0.1
2: 1 2 0.2
3: 2 1 0.3
4: 2 2 0.4
> B <- dcast(A, x~y)
Using v as value column: use value.var to override.
> B
x 1 2
1 1 0.1 0.2
2 2 0.3 0.4
Apparently I can reshape a data.table from long to wide using f.x. dcast of package reshape2. But data.table comes along with an overloaded bracket-operator offering parameters like 'by' and 'group', which make me wonder if it is possible to achieve it using this (to data.table specific functionality)?
Just one random example from the manual:
DT[,lapply(.SD,sum),by=x]
That looks awesome - but I don't fully understand the usage yet.
I neither found a way nor an example for this so maybe it is just not possible maybe it isn't even supposed to be - so, a definite "no, is not possible because ..." is then of course also a valid answer.
I'll pick an example with unequal groups so that it's easier to illustrate for the general case:
A <- data.table(x=c(1,1,1,2,2), y=c(1,2,3,1,2), v=(1:5)/5)
> A
x y v
1: 1 1 0.2
2: 1 2 0.4
3: 1 3 0.6
4: 2 1 0.8
5: 2 2 1.0
The first step is to get the number of elements/entries for each group of "x" to be the same. Here, for x=1 there are 3 values of y, but only 2 for x=2. So, we'll have to fix that first with NA for x=2, y=3.
setkey(A, x, y)
A[CJ(unique(x), unique(y))]
Now, to get it to wide format, we should group by "x" and use as.list
on v
as follows:
out <- A[CJ(unique(x), unique(y))][, as.list(v), by=x]
x V1 V2 V3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0 NA
Now, you can set the names of the reshaped columns using reference with setnames
as follows:
setnames(out, c("x", as.character(unique(A$y)))
x 1 2 3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0 NA
Use dcast()
(now a default data.table
method, from version 1.9.5; earlier versions use dcast.data.table
) as in
> dcast(A,x~y)
Using 'v' as value column. Use 'value.var' to override
x 1 2 3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0 NA
This is fast and obviates the need to setnames()
.
It is also especially helpful when y
in the above example is a factor variable with character levels -- e.g. 'Low', 'Medium', 'High' -- because CJ()
may not return the wide data with variables in the order that setnames()
expects, and you can end up with your data mislabeled badly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With