I have a data.frame with many columns (~50). Some of them are character, some are numeric and 3 of them I use for grouping.
I need to:
Let's say, we're using modified iris data as below:
data(iris)
iris$year <- rep(c(2000,3000),each=25) ## for grouping
iris$color <- rep(c("red","green","blue"),each=50) ## character column
iris[1,] <- NA ## introducing NAs
I have ~50 columns in total, numeric and character mixed together. I've been trying something like:
giris <- group_by(iris, Species, year)
cls <- unlist(sapply(giris, class)) ## find out classes
action <- ifelse(cls == "numeric", "mean", "first")
action <- paste(action)
summarise_each(giris, action)
What I get is means for all columns in a group followed by columns with the first values in respective group. And NAs are not handled... Which is not exactly what I seek...
Help anyone?
You could try this with an if
/else
in the funs
of summarise_each
:
iris %>%
group_by(Species, year) %>%
summarise_each(funs(if(is.numeric(.)) mean(., na.rm = TRUE) else first(.)))
Since you have some NA's also in grouping columns, you could add a filter
statement:
iris %>%
filter(!is.na(Species) & !is.na(year)) %>%
group_by(Species, year) %>%
summarise_each(funs(if(is.numeric(.)) mean(., na.rm = TRUE) else first(.)))
#Source: local data frame [6 x 7]
#Groups: Species [?]
#
# Species year Sepal.Length Sepal.Width Petal.Length Petal.Width color
# (fctr) (dbl) (dbl) (dbl) (dbl) (dbl) (chr)
#1 setosa 2000 5.025 3.479167 1.4625 0.250 red
#2 setosa 3000 4.984 3.376000 1.4640 0.244 red
#3 versicolor 2000 6.012 2.776000 4.3120 1.344 green
#4 versicolor 3000 5.860 2.764000 4.2080 1.308 green
#5 virginica 2000 6.576 2.928000 5.6400 2.044 blue
#6 virginica 3000 6.600 3.020000 5.4640 2.008 blue
To avoid potential NA's in the color column (or any non-numeric columns), you could modify it to first(na.omit(.))
.
You could also try data.table
:
library(data.table)
setDT(iris)
iris[!is.na(Species) & !is.na(year), lapply(.SD, function(x) {
if(is.numeric(x)) mean(x, na.rm = TRUE) else x[!is.na(x)][1L]}),
by = list(Species, year)]
# Species year Sepal.Length Sepal.Width Petal.Length Petal.Width color
#1: setosa 2000 5.025 3.479167 1.4625 0.250 red
#2: setosa 3000 4.984 3.376000 1.4640 0.244 red
#3: versicolor 2000 6.012 2.776000 4.3120 1.344 green
#4: versicolor 3000 5.860 2.764000 4.2080 1.308 green
#5: virginica 2000 6.576 2.928000 5.6400 2.044 blue
#6: virginica 3000 6.600 3.020000 5.4640 2.008 blue
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With