I'm trying to use the dplyr package to apply a function to all columns in a data.frame that are not being grouped, which I would do with aggregate()
:
aggregate(. ~ Species, data = iris, mean)
where mean
is applied to all columns not used for grouping. (Yes, I know I can use aggregate, but I'm trying to understand dplyr.)
I can use summarize
like this:
species <- group_by(iris, Species)
summarize(species,
Sepal.Length = mean(Sepal.Length),
Sepal.Width = mean(Sepal.Width))
But is there a way to have mean()
applied to all columns that are not grouped, similar to the . ~
notation of aggregate()
? I have a data.frame with 30 columns that I want to aggregate, so writing out the individual statements is not ideal.
If you're willing to try out an experimental dplyr, you can try out the
new (and still experimental) summarise_each()
:
devtools::install_github("hadley/dplyr", ref = "colwise")
library(dplyr)
iris %.%
group_by(Species) %.%
summarise_each(funs(mean))
## Source: local data frame [3 x 5]
##
## Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1 setosa 5.006 3.428 1.462 0.246
## 2 versicolor 5.936 2.770 4.260 1.326
## 3 virginica 6.588 2.974 5.552 2.026
iris %.%
group_by(Species) %.%
summarise_each(funs(min, max))
## Source: local data frame [3 x 9]
##
## Species Sepal.Length_min Sepal.Width_min Petal.Length_min
## 1 setosa 4.3 2.3 1.0
## 2 versicolor 4.9 2.0 3.0
## 3 virginica 4.9 2.2 4.5
## Variables not shown: Petal.Width_min (dbl), Sepal.Length_max (dbl),
## Sepal.Width_max (dbl), Petal.Length_max (dbl), Petal.Width_max (dbl)
Feedback much appreciated!
This will appear in dplyr 0.2.
This will get you almost all the way in dplyr
.
h = iris %.%
group_by(Species) %.%
do(function(d){
sapply(Filter(is.numeric, d), mean)
})
as.data.frame(h)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With