I have the following data.table
library(data.table)
testdt <- data.table(var1=rep(c("a", "b"), e=3), p1=1:6, p2=11:16)
# var1 p1 p2
#1: a 1 11
#2: a 2 12
#3: a 3 13
#4: b 4 14
#5: b 5 15
#6: b 6 16
I need to have the median value of each var1
for each p*
, the p*
should be in rows and the different unique values of var1
in columns.
So, I'm looking for this output:
variable a b
1 p1 2 5
2 p2 12 15
The simplest way I found to get it is:
dcast(melt(testdt, id.vars = "var1", measure.vars = c("p1", "p2")),
variable ~ var1,
value.var = "value",
fun.aggregate = median)
But I have a feeling I'm missing something here (like the most suited function) so I'd love to know a direct way (unique function) to do the same.
I'm aware recast
of the reshape2 package could do the trick with recast(testdt, variable~var1, fun=median, id.var="var1")
, but I'd like to avoid loading another package.
Edit:
I'm looking for a solution both simple and efficient. This would be applied on a list of ~40 tables with ~300 columns and ~80 rows
If speed is a primary concern, there is a slight speed boost of around 23% (although in milliseconds) if you calculate the median
first. This delta is also likely to increase as the size of your data set grows as there is less data to melt.
library(data.table)
dcast(melt(testdt[, lapply(.SD, median), by=var1], id.vars="var1"), variable ~ var1))
benchmark
Unit: milliseconds
expr min lq mean median uq max neval
fun.aggregate = median 4.221654 4.453063 4.87418 4.510775 4.579718 35.28569 1000
lapply(.SD, median) 3.196289 3.410711 3.77483 3.461073 3.523096 22.78637 1000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With