Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr conditional summarise function

Tags:

r

dplyr

I have this situation where I need a different summary function based on a condition. For example, using iris, say for some reason I wanted the sum of the petal width if the species was setosa, otherwise I wanted the mean of the petal width.

Naively, I wrote this using case_when, which does not work:

iris <- tibble::as_tibble(iris)

 iris %>% 
  group_by(Species) %>% 
  summarise(pwz = case_when(
    Species == "setosa" ~ sum(Petal.Width, na.rm = TRUE),
    TRUE                ~ mean(Petal.Width, na.rm = TRUE)))

Error in summarise_impl(.data, dots) : Column pwz must be length 1 (a summary value), not 50

I eventually found something like this, summarizing using each method, and then in a mutate picking which one I actually wanted:

iris %>% 
  group_by(Species) %>% 
  summarise(pws = sum(Petal.Width, na.rm = TRUE),
            pwm = mean(Petal.Width, na.rm = TRUE)) %>% 
  mutate(pwz = case_when(
    Species == "setosa" ~ pws,
    TRUE                ~ pwm)) %>% 
  select(-pws, -pwm)

But that seems more than a bit awkward with creating all these summarized values and only picking one at the end, especially when my real case_when is a lot more complicated. Can I not use case_when inside of summarise? Do I have my syntax wrong? Any help is appreciated!

Edit: I suppose I should have pointed out that I have multiple conditions/functions (just assume I've got, depending on the variable, some that need mean, sum, max, min, or other summary).

like image 794
michdn Avatar asked Jan 29 '23 05:01

michdn


1 Answers

This is pretty easy with data.table

library(data.table)
iris2 <- as.data.table(iris)

iris2[, if(Species == 'setosa') sum(Petal.Width) 
        else mean(Petal.Width)
      , by = Species]

More concisely, but maybe not as clear

iris2[, ifelse(Species == 'setosa', sum, mean)(Petal.Width)
      , by = Species]

With dplyr you can do

iris %>% 
  group_by(Species) %>% 
  summarise(pwz = if_else(first(Species == "setosa")
                          , sum(Petal.Width)
                          , mean(Petal.Width)))

Note:

I'm thinking it probably makes more sense to "spread" your data with tidyr::spread so that each day has a column for temperature, rainfall, etc. Then you can use summarise in the usual way.

like image 192
IceCreamToucan Avatar answered Jan 30 '23 18:01

IceCreamToucan