Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pivot a large data.table

Tags:

r

data.table

I have a large data table in R:

library(data.table)
set.seed(1234)
n <- 1e+07*2
DT <- data.table(
  ID=sample(1:200000, n, replace=TRUE), 
  Month=sample(1:12, n, replace=TRUE),
  Category=sample(1:1000, n, replace=TRUE),
  Qty=runif(n)*500,
  key=c('ID', 'Month')
)
dim(DT)

I'd like to pivot this data.table, such that Category becomes a column. Unfortunately, since the number of categories isn't constant within groups, I can't use this answer.

Any ideas how I might do this?

/edit: Based on joran's comments and flodel's answer, we're really reshaping the following data.table:

agg <- DT[, list(Qty = sum(Qty)), by = c("ID", "Month", "Category")]

This reshape can be accomplished a number of ways (I've gotten some good answers so far), but what I'm really looking for is something that will scale well to a data.table with millions of rows and hundreds to thousands of categories.

like image 388
Zach Avatar asked Apr 04 '13 22:04

Zach


2 Answers

data.table implements faster versions of melt/dcast data.table specific methods (in C). It also adds additional features for melting and casting multiple columns. Please see the Efficient reshaping using data.tables vignette.

Note that we don't need to load reshape2 package.

library(data.table)
set.seed(1234)
n <- 1e+07*2
DT <- data.table(
  ID=sample(1:200000, n, replace=TRUE), 
  Month=sample(1:12, n, replace=TRUE),
  Category=sample(1:800, n, replace=TRUE), ## to get to <= 2 billion limit
  Qty=runif(n),
  key=c('ID', 'Month')
)
dim(DT)

> system.time(ans <- dcast(DT, ID + Month ~ Category, fun=sum))
#   user  system elapsed
# 65.924  20.577  86.987
> dim(ans)
# [1] 2399401     802
like image 148
Arun Avatar answered Sep 29 '22 13:09

Arun


Like that?

agg <- DT[, list(Qty = sum(Qty)), by = c("ID", "Month", "Category")]

reshape(agg, v.names = "Qty", idvar = c("ID", "Month"),
        timevar = "Category", direction = "wide")
like image 41
flodel Avatar answered Sep 29 '22 12:09

flodel