Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging in plyr or dplyr - seeing which group

When I'm using plyr and dplyr to analyze a big dataset that is grouped by an id, I sometimes get an error in my function. I can use browser() or debugger() to explore what's going on, but one issue is that I don't know if the problem is with the first id, or the 100th. I can use the debugger to let me stop at the error, but is there an easy way to see what id caused the problem besides just including the id as a function input for the sole purpose of debugging? I illustrate with the example below.

meanerr = function(y) {
  m = mean(y)
  stopifnot(!is.na(m))
  return(m)
}

d = data.frame(id=c(1,1,1,1,2,2),y=c(1,2,3,4,5,NA))
dsumm = ddply(d,"id",summarise,mean=meanerr(y))

Of course this causes the error below and when I dive into the dump, I just have to clue where to look (see below)

> options(error=dump.frames)
> source('~/svn/pgm/test_debug_ddply.R')
Error: !is.na(m) is not TRUE
> debugger()
Message:  Error: !is.na(m) is not TRUE
Available environments had calls:
1: source("~/svn/pgm/test_debug_ddply.R")
2: withVisible(eval(ei, envir))
3: eval(ei, envir)
4: eval(expr, envir, enclos)
5: test_debug_ddply.R#9: ddply(d, "id", summarise, mean = meanerr(y))
6: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, .inform = .inform, .parallel = .
7: llply(.data = .data, .fun = .fun, ..., .progress = .progress, .inform = .inform, .parallel = .p
8: loop_apply(n, do.ply)
9: (function (i) 
{
    piece <- pieces[[i]]
    if (.inform) {
        res <- try(.fun(piece, ...))

10: .fun(piece, ...)
11: eval(cols[[col]], .data, parent.frame())
12: eval(expr, envir, enclos)
13: meanerr(y)
14: test_debug_ddply.R#3: stopifnot(!is.na(m))
15: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), ch), call. = FALSE, 

Anyway, maybe just including the id as an input every single time for easy debugging is just the way to go, but I was wondering if there was something more elegant that the professionals use without requiring the passing of extra variables.

Andy

like image 523
Andy Stein Avatar asked Jan 13 '16 16:01

Andy Stein


1 Answers

I run into this all the time with dplyr's group_by() I've had trouble using my usual options(error=recover).

I've found that wrapping the offending function in a tryCatch() does the trick:

> dsumm = ddply(d,"id",summarise,mean=tryCatch(meanerr(y),error=function(e){"error"}))
> dsumm
  id   mean
1  1    2.5
2  2  error
like image 200
avoorman Avatar answered Nov 16 '22 17:11

avoorman