I'm struggling to use data.table to summarize results of vector functions, something that's easy in ddply.
Issue 1: aggregate with an (expensive) function with vector output
dt <- data.table(x=1:20,y=rep(c("a","b"),each=10))
This ddply command produces what I want:
ddply(dt,~y,function(dtbit) quantile(dtbit$x))
This data table command does not do what I want:
dt[,quantile(x),by=list(y)]
I can hack at data.table like so:
dt[,list("0%"=quantile(x,0),"25%"=quantile(x,0.25),
"50%"=quantile(x,0.5)),by=list(y)]
But that verbose, and also would be slow if the vector function "quantile" were slow.
A similar example is:
dt$z <- rep(sqrt(1:10),2)
ddply(dt,~y,function(dtbit) coef(lm(z~x,dtbit)))
Issue 2: Using a function with both vector input and output
xzsummary <- function(dtbit) t(summary(dtbit[,"x"]-dtbit[,"z"]))
ddply(dt,~y,xzsummary )
Can I do that kind of thing easily in data.table?
Apologies if these questions are already prominently answered.
This is a similar, not identical, issue to: data.table aggregations that return vectors, such as scale()
a) What is data. table ? data.table is an R package that provides an enhanced version of data.frame s, which are the standard data structure for storing data in base R. In the Data section above, we already created a data.table using fread() . We can also create one using the data.table() function.
To add row to R Data Frame, append the list or vector representing the row, to the end of the data frame. nrow(df) returns the number of rows in data frame.
> dt[ , as.list(quantile(x)),by=y]
y 0% 25% 50% 75% 100%
1: a 1 3.25 5.5 7.75 10
2: b 11 13.25 15.5 17.75 20
I tried using rbind
, but that failed to generate the by-y arrangement I was thinking you wanted. The trick with as.list
(vs. list
) is that it constructs a multi-element list wehn givne a vector, whereas list
only puts the vector into a single element list.
as.list
acts like sapply(x, list)
:
> dt[ , sapply(quantile(x), list), by=y]
y 0% 25% 50% 75% 100%
1: a 1 3.25 5.5 7.75 10
2: b 11 13.25 15.5 17.75 20
Your target solution:
> ddply(dt,~y,function(dtbit) quantile(dtbit$x))
y 0% 25% 50% 75% 100%
1 a 1 3.25 5.5 7.75 10
2 b 11 13.25 15.5 17.75 20
I was kind of proud of that solution, but mindful of fortunes::fortune("Liaw-Baron principle")
............
Lastly, by what we could call the 'Liaw-Baron principle', every question that can be asked has in fact already been asked. -- Dirk Eddelbuettel (citing Andy Liaw's and Jonathan Baron's opinion on unique questions on R-help) R-help (January 2006)
.... I did a search on: [r] data.table as.list
, and find that I am by no means the first to post this strategy on SO:
Tabulate a data frame in R
Using ave() with function which returns a vector
create a formula in a data.table environment in R
I don't really know if this question would be considered a duplicate, but I am particularly grateful to @G.Grothedieck for the last one. It may be where I picked up the strategy. There were about 125 hits to that search and I've only gone through the first 20 to gather those examples, so there may be some more pearls that I haven't uncovered.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With