Reshape long structured data.table into a wide structure using data.table functionality?

Question

> library(data.table)
> A <- data.table(x = c(1,1,2,2), y = c(1,2,1,2), v = c(0.1,0.2,0.3,0.4))
> A
   x y   v
1: 1 1 0.1
2: 1 2 0.2
3: 2 1 0.3
4: 2 2 0.4
> B <- dcast(A, x~y)
Using v as value column: use value.var to override.
> B
  x   1   2
1 1 0.1 0.2
2 2 0.3 0.4

Apparently I can reshape a data.table from long to wide using f.x. dcast of package reshape2. But data.table comes along with an overloaded bracket-operator offering parameters like 'by' and 'group', which make me wonder if it is possible to achieve it using this (to data.table specific functionality)?

Just one random example from the manual:

DT[,lapply(.SD,sum),by=x]

That looks awesome - but I don't fully understand the usage yet.

I neither found a way nor an example for this so maybe it is just not possible maybe it isn't even supposed to be - so, a definite "no, is not possible because ..." is then of course also a valid answer.

Arun · Accepted Answer

I'll pick an example with unequal groups so that it's easier to illustrate for the general case:

A <- data.table(x=c(1,1,1,2,2), y=c(1,2,3,1,2), v=(1:5)/5)
> A
   x y   v
1: 1 1 0.2
2: 1 2 0.4
3: 1 3 0.6
4: 2 1 0.8
5: 2 2 1.0

The first step is to get the number of elements/entries for each group of "x" to be the same. Here, for x=1 there are 3 values of y, but only 2 for x=2. So, we'll have to fix that first with NA for x=2, y=3.

setkey(A, x, y)
A[CJ(unique(x), unique(y))]

Now, to get it to wide format, we should group by "x" and use as.list on v as follows:

out <- A[CJ(unique(x), unique(y))][, as.list(v), by=x]
   x  V1  V2  V3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

Now, you can set the names of the reshaped columns using reference with setnames as follows:

setnames(out, c("x", as.character(unique(A$y)))

   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

Gabi · Answer

Use dcast() (now a default data.table method, from version 1.9.5; earlier versions use dcast.data.table) as in

> dcast(A,x~y)
Using 'v' as value column. Use 'value.var' to override
   x   1   2   3
1: 1 0.2 0.4 0.6
2: 2 0.8 1.0  NA

This is fast and obviates the need to setnames().

It is also especially helpful when y in the above example is a factor variable with character levels -- e.g. 'Low', 'Medium', 'High' -- because CJ() may not return the wide data with variables in the order that setnames() expects, and you can end up with your data mislabeled badly.

Reshape long structured data.table into a wide structure using data.table functionality?

Tags:

r

data.table

Raffael

2 Answers

Arun

Gabi

Recent Activity

Donate For Us

Reshape long structured data.table into a wide structure using data.table functionality?

Tags:

r

data.table

Raffael

2 Answers

Arun

Gabi

Related questions

Recent Activity

Donate For Us