Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter try-error objects from a column / list (dplyr but also more general)

Tags:

r

dplyr

purrr

I was playing with some data that I gathered in a data frame, where I want to apply a function to all the elements of a column. Usually I use purrr::map() for this. However, sometimes this will not work if the functions returns an error for one of the elements of a column:

f <- function(x) {
  if(x==2) stop("I hate 2") else x
}

library(dplyr)
dd <- data.frame(x = c(1:2))
dd2 <- dd %>% 
  mutate(fx = purrr::map(.x = x, .f = ~f(.)))
Error: I hate 2

So I can wrap my function f with try(), and obtain a column of results:

> dd2 <- dd %>% 
+   mutate(fx = purrr::map(.x = x, .f = ~try(f(.))))
Error in f(.) : I hate 2
> dd2
  x                         fx
1 1                          1
2 2 Error in f(.) : I hate 2\n

Now I would ideally like to use filter() to filter out the line(s) with the errors, but I can't seem to be able to do that. Neither of these will yield a data frame with just the first line here:

dd2 %>% filter(is.integer(fx) )
dd2 %>% filter(is.integer(.$fx) )

dd2 %>% filter(class(fx) != "try-error")
dd2 %>% filter(class(.$fx) != "try-error")

lapply(dd2, is.numeric)

A dirty trick that I was thinking about would be to use try_catch() instead, and make it return an object of the same type as f() in case of error, for example -99999 here, and filter those out, but I am looking for a cleaner solution.

like image 956
Theodor Avatar asked Feb 07 '23 07:02

Theodor


1 Answers

Because you are already using purrr, you might try to wrap the function with safely. This function wraps a function and makes it return a list of two elements result and error. One of these is always NULL.

Here's the data setup, similar to the original post.

library(dplyr)
df <- data.frame(x = c(1:2, 1))

f <- function(x) {
  if (x == 2) stop("I hate 2") else x
}

We wrap the function with safely and call it.

f_safe <- purrr::safely(f)

df2 <- df %>% mutate(fxx = x %>% purrr::map(.f = f_safe))
df2
#>   x               fxx
#> 1 1                 1
#> 2 2 I hate 2, .f(...)
#> 3 1                 1

We can confirm that fxx is a list-column with result and error elements in each list.

str(df2$fxx)
#> List of 3
#>  $ :List of 2
#>   ..$ result: num 1
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: NULL
#>   ..$ error :List of 2
#>   .. ..$ message: chr "I hate 2"
#>   .. ..$ call   : language .f(...)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
#>  $ :List of 2
#>   ..$ result: num 1
#>   ..$ error : NULL

Now, we just ask each element in the list-column whether its error is null.

df2 <- df2 %>% 
  mutate(no_error = fxx %>% purrr::map_lgl(.f = ~ is.null(.x$error)))
df2
#>   x               fxx no_error
#> 1 1                 1     TRUE
#> 2 2 I hate 2, .f(...)    FALSE
#> 3 1                 1     TRUE

I used map_lgl so that the result is not a list-column but a filter-able vector of booleans.

df2 %>% filter(no_error)
#>   x fxx no_error
#> 1 1   1     TRUE
#> 2 1   1     TRUE

If we want to use the fxx column like a regular vector, we will have to mutate(fxx = fxx %>% purrr::map_dbl("result")) first to convert it from a list-column to a simple vector.

Edit: Another solution would be wrapping with dplyr::failwith and using a sentinel value like NA or error for errors and then filtering elements that match the sentinel value.

like image 50
TJ Mahr Avatar answered Feb 09 '23 00:02

TJ Mahr