Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle pipes producing empty data further down the dplyr pipeline

Tags:

r

dplyr

This issue occurs in dplyr version 0.30.

I have a chain of pipes %>% starting with a filter. Sometimes this filter reduces the data frame to no rows. Some where further down the pipe line, I have a function which uses if to mutate the data frame. However, this function errors if the data frame has been reduced to zero rows prior.

For example

data(mtcars)

stupid_function <- function(x){
    if( x == 6){
        return(2*x)
    } else {
        return(x)
    }
}

for(i in 6:10) {

    data <-
        mtcars %>% 
        filter(cyl == i) %>%
        rowwise() %>%
        mutate(carb2 = stupid_function(carb)) %>%
        group_by(carb2) %>%
        summarise(mean(wt))

    print(data)

}

works for i = 6 but fails for i = 7, e.g.

Is there anyway to handle this problem? Two approaches I have considered is breaking up the chain in the middle to check that the data has more than one row after filtering, or wrapping everything in a tryCatch.

like image 918
Alex Avatar asked Nov 09 '22 20:11

Alex


1 Answers

Firstly, in the latest version of dplyr (0.4.0), filter no longer crashes, but returns its input when the output is 0-sized(see #782), so you may no longer have your error. Specifically:

library(dplyr)
data(mtcars)

stupid_function <- function(x){
  if(x == 6){
    return(2 * x)
  } else {
    return(x)
  }
}

for(i in 6:10) {

  data <-
    mtcars %>% 
    filter(cyl == i) %>%
    rowwise() %>%
    mutate(carb2 = stupid_function(carb)) %>%
    group_by(carb2) %>%
    summarise(mean(wt))

  print(data)

}

Returns:

Source: local data frame [3 x 2]

  carb2 mean(wt)
1     1  3.33750
2     4  3.09375
3    12  2.77000
Source: local data frame [0 x 2]

Variables not shown: carb2 (dbl), mean(wt) (dbl)
Source: local data frame [4 x 2]

  carb2 mean(wt)
1     2 3.560000
2     3 3.860000
3     4 4.433167
4     8 3.570000
Source: local data frame [0 x 2]

Variables not shown: carb2 (dbl), mean(wt) (dbl)
Source: local data frame [0 x 2]

Variables not shown: carb2 (dbl), mean(wt) (dbl)
Warning messages:
1: Grouping rowwise data frame strips rowwise nature 
2: Grouping rowwise data frame strips rowwise nature 
3: Grouping rowwise data frame strips rowwise nature 
4: Grouping rowwise data frame strips rowwise nature 
5: Grouping rowwise data frame strips rowwise nature 

You may also want to trap for 0-sized input in stupid_function with something like this:

stupid_function <- function(x = NULL) {
  if (is.null(x)) {
    return(0)
  } else if(x == 6) {
    return(2 * x)
  } else {
    return(x)
  }
}

This pre-allocates NULL to x and assigns 0 (or you could assign NULL) as the return if nothing else populates it.

like image 158
Avraham Avatar answered Nov 15 '22 07:11

Avraham