I find it difficult to debug my code when using purrr
and some of the map()
variants. Especially I have problems locating where my code fails because the error messages do not tell me which row (dataframe) threw the error.
What is a good approach to locating errors when using purrr
?
Consider the following example:
library(tidyverse)
# Prepare some data
set.seed(1)
a <- tibble(
x = rnorm(2),
y = rnorm(2))
b <- tibble(
x = rnorm(2),
y = rnorm(2))
c <- tibble(
x = rnorm(2),
y = letters[1:2])
df <- tibble(
dataframes = list(a,b,c))
df
#> # A tibble: 3 x 1
#> dataframes
#> <list>
#> 1 <tibble [2 x 2]>
#> 2 <tibble [2 x 2]>
#> 3 <tibble [2 x 2]>
# A simple function
add_cols <- function(.data){
.data %>%
mutate(
z = x + y)
}
# Running the function with map() will return an error
df %>%
mutate(
dataframes = map(.x = dataframes, ~add_cols(.x)))
#> Error in x + y: non-numeric argument to binary operator
map()
returns an error because you can't add a number and a letter. The error message tells us what went wrong, but not where it went wrong. In this example, it is obvious that the error comes from the third row in df
, but imagine the function was much more complicated, and that we were applying it to 1000's of rows. How would you locate the error then?
So far, my approach is to use some version of this monstrosity of a loop. I think the disadvantages of this approach are quite obvious. Please help me find a better way to do this.
for(i in 1:nrow(df)){
print(paste("Testing row number", i))
df %>%
filter(row_number() == i) %>%
unnest(cols = c(dataframes)) %>%
add_cols()
}
#> [1] "Testing row number 1"
#> [1] "Testing row number 2"
#> [1] "Testing row number 3"
#> Error in x + y: non-numeric argument to binary operator
I am using Rstudio in case that is relevant to your suggestions.
Created on 2019-10-15 by the reprex package (v0.3.0)
We can use possibly
or safely
from purrr
and specify the otherwise
.
library(dplyr)
library(purrr)
out <- df %>%
mutate(dataframes = map(dataframes, ~
possibly(add_cols, otherwise = 'error here')(.x)))
out$dataframes
#[[1]]
# A tibble: 2 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 -0.626 -0.836 -1.46
#2 0.184 1.60 1.78
#[[2]]
# A tibble: 2 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 0.330 0.487 0.817
#2 -0.820 0.738 -0.0821
#[[3]]
#[1] "error here"
which can be located by a simple check
out$dataframes %in% 'error here'
#[1] FALSE FALSE TRUE
To find the position, wrap with which
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With