Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fuse melt/dcast in one step

Tags:

r

data.table

I have the following data.table

library(data.table)
testdt <- data.table(var1=rep(c("a", "b"), e=3), p1=1:6, p2=11:16)
#   var1 p1 p2
#1:    a  1 11
#2:    a  2 12
#3:    a  3 13
#4:    b  4 14
#5:    b  5 15
#6:    b  6 16

I need to have the median value of each var1 for each p*, the p* should be in rows and the different unique values of var1 in columns.
So, I'm looking for this output:

 variable  a  b
1       p1  2  5
2       p2 12 15

The simplest way I found to get it is:

dcast(melt(testdt, id.vars = "var1", measure.vars = c("p1", "p2")),
      variable ~ var1,
      value.var = "value",
      fun.aggregate = median)

But I have a feeling I'm missing something here (like the most suited function) so I'd love to know a direct way (unique function) to do the same.

I'm aware recast of the reshape2 package could do the trick with recast(testdt, variable~var1, fun=median, id.var="var1"), but I'd like to avoid loading another package.

Edit:

I'm looking for a solution both simple and efficient. This would be applied on a list of ~40 tables with ~300 columns and ~80 rows

like image 900
Cath Avatar asked Jan 22 '16 10:01

Cath


1 Answers

If speed is a primary concern, there is a slight speed boost of around 23% (although in milliseconds) if you calculate the median first. This delta is also likely to increase as the size of your data set grows as there is less data to melt.

library(data.table)
dcast(melt(testdt[, lapply(.SD, median), by=var1], id.vars="var1"), variable ~ var1))

benchmark

Unit: milliseconds

expr                         min       lq    mean   median       uq      max neval
fun.aggregate = median  4.221654 4.453063 4.87418 4.510775 4.579718 35.28569  1000
lapply(.SD, median)     3.196289 3.410711 3.77483 3.461073 3.523096 22.78637  1000
like image 57
manotheshark Avatar answered Nov 04 '22 21:11

manotheshark