Here's what I'm trying to do. My data frame has a factor variable, "country", and I want to split the data frame based on country. Then, I want to take the column mean over every variable for every country's data frame.
Data here: https://github.com/pourque/country-data
I've done this so far...
myList <- split(df1, df1$country)
for(i in 1:length(myList)) {
aggregate <- mapply(myList[[i]][,-c(38:39)], colMeans)
}
(I'm not including the 38th and 39th columns because those are factors.)
I've read this (function over more than one list) , which makes me think mapply is the answer here...but I'm getting this error:
Error in match.fun(FUN) :
'myList[[i]][, -c(38:39)]' is not a function, character or symbol
Maybe I'm formatting it incorrectly?
It's straightforward in base R using aggregate
without the need to split
the data.frame into a list beforehand. Here's an example using the built-in iris data where you compute the mean
of all variables except those in the first and second column by group of Species
:
data(iris)
aggregate(. ~ Species, iris[-(1:2)], mean)
# Species Petal.Length Petal.Width
#1 setosa 1.462 0.246
#2 versicolor 4.260 1.326
#3 virginica 5.552 2.026
The .
inside aggregate
is used to specify that you want to use all remaining columns of the data.frame except the grouping variable (Species in this case). And because you specify iris[-(1:2)]
as input data, the first and second columns are not used either.
For your data, it should then be something like:
aggregate(. ~ country, df1[-c(38:39)], mean)
library(dplyr)
df1 %>%
group_by(country) %>%
select(-age, -gender) %>%
summarise_each(funs(mean))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With