Dplyr

Question

I am able to check for a specific column if it exists using 'contains' in dplyr . I struggle with evaluating the summary of the expression if it does not exist.

Here is my code snippet:

  df <- Prod%>%
      group_by(Entity)%>%
      select(Entity,`Cum.Oil`,`Cum.Gas`,contains("EUR")%>%
      summarise(Oil = mean(`Cum.Oil`), Gas = mean(`Cum.Gas`), EUR=mean(EUR))

How can I ignore 'EUR' expression in the summarise expression if the EUR column does not exist?

Juan C · Accepted Answer

Something like this should work:

df <- Prod%>%
      group_by(Entity)%>%
      summarise(across(any_of(c('Cum.Oil', 'Cum.Gas', 'Eur')), ~mean(.x), 
                .names = '{.col %>% str_remove("Cum.")}' )

Can't test without some reprex, though.

Tip: You can also use any_of in select statements:

df <- Prod%>%
      group_by(Entity)%>%
      select(any_of(c('Entity', 'Cum.Oil', 'Cum.Gas', "EUR"))

asd-tm · Answer

If it is more convinient you can use ifelse constructions within dplyr pipes as well though I consider @Juan answer more Rly elegant:

Prod <- data.frame(Entity = c("a", "b", "a"),`Cum.Oil` = 1:3,`Cum.Gas`=c(2,4,6)
                   , EUR = c(7,9,9)
)
Prod %>% 
  group_by(Entity) %>% {
  ifelse(exists("EUR", .), 
         . <- summarise(., Oil = mean(`Cum.Oil`), 
                        Gas = mean(`Cum.Gas`), 
                        EUR = mean(EUR)
         ), 
         . <- summarise(., Oil = mean(`Cum.Oil`), 
                        Gas = mean(`Cum.Gas`)
         )
  )  
  .
}

Dplyr - Select if a column exists and summarize if it does exist

Tags:

r

CodeMaster

2 Answers

Juan C

asd-tm

Recent Activity

Donate For Us