I would like to split my data frame using a couple of columns and call let's say fivenum
on each group.
aggregate(Petal.Width ~ Species, iris, function(x) summary(fivenum(x)))
The returned value is a data.frame with only 2 columns and the second being a matrix. How can I turn it into normal columns of a data.frame?
Update
I want something like the following with less code using fivenum
ddply(iris, .(Species), summarise,
Min = min(Petal.Width),
Q1 = quantile(Petal.Width, .25),
Med = median(Petal.Width),
Q3 = quantile(Petal.Width, .75),
Max = max(Petal.Width)
)
Here is a solution using data.table
(while not specifically requested, it is an obvious compliment or replacement for aggregate
or ddply
. As well as being slightly long to code, repeatedly calling quantile
will be inefficient, as for each call you will be sorting the data
library(data.table)
Tukeys_five <- c("Min","Q1","Med","Q3","Max")
IRIS <- data.table(iris)
# this will create the wide data.table
lengthBySpecies <- IRIS[,as.list(fivenum(Sepal.Length)), by = Species]
# and you can rename the columns from V1, ..., V5 to something nicer
setnames(lengthBySpecies, paste0('V',1:5), Tukeys_five)
lengthBySpecies
Species Min Q1 Med Q3 Max
1: setosa 4.3 4.8 5.0 5.2 5.8
2: versicolor 4.9 5.6 5.9 6.3 7.0
3: virginica 4.9 6.2 6.5 6.9 7.9
Or, using a single call to quantile
using the appropriate prob
argument.
IRIS[,as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25))), by = Species]
Species 0% 25% 50% 75% 100%
1: setosa 4.3 4.800 5.0 5.2 5.8
2: versicolor 4.9 5.600 5.9 6.3 7.0
3: virginica 4.9 6.225 6.5 6.9 7.9
Note that the names of the created columns are not syntactically valid, although you could go through a similar renaming using setnames
EDIT
Interestingly, quantile
will set the names of the resulting vector if you set names = TRUE
, and this will copy (slow down the number crunching and consume memory - it even warns you in the help, fancy that!)
Thus, you should probably use
IRIS[,as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25), names = FALSE)), by = Species]
Or, if you wanted to return the named list, without R
copying internally
IRIS[,{quant <- as.list(quantile(Sepal.Length, prob = seq(0,1, by = 0.25), names = FALSE))
setattr(quant, 'names', Tukeys_five)
quant}, by = Species]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With