Debugging in plyr or dplyr - seeing which group

Question

When I'm using plyr and dplyr to analyze a big dataset that is grouped by an id, I sometimes get an error in my function. I can use browser() or debugger() to explore what's going on, but one issue is that I don't know if the problem is with the first id, or the 100th. I can use the debugger to let me stop at the error, but is there an easy way to see what id caused the problem besides just including the id as a function input for the sole purpose of debugging? I illustrate with the example below.

meanerr = function(y) {
  m = mean(y)
  stopifnot(!is.na(m))
  return(m)
}

d = data.frame(id=c(1,1,1,1,2,2),y=c(1,2,3,4,5,NA))
dsumm = ddply(d,"id",summarise,mean=meanerr(y))

Of course this causes the error below and when I dive into the dump, I just have to clue where to look (see below)

> options(error=dump.frames)
> source('~/svn/pgm/test_debug_ddply.R')
Error: !is.na(m) is not TRUE
> debugger()
Message:  Error: !is.na(m) is not TRUE
Available environments had calls:
1: source("~/svn/pgm/test_debug_ddply.R")
2: withVisible(eval(ei, envir))
3: eval(ei, envir)
4: eval(expr, envir, enclos)
5: test_debug_ddply.R#9: ddply(d, "id", summarise, mean = meanerr(y))
6: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, .inform = .inform, .parallel = .
7: llply(.data = .data, .fun = .fun, ..., .progress = .progress, .inform = .inform, .parallel = .p
8: loop_apply(n, do.ply)
9: (function (i) 
{
    piece <- pieces[[i]]
    if (.inform) {
        res <- try(.fun(piece, ...))

10: .fun(piece, ...)
11: eval(cols[[col]], .data, parent.frame())
12: eval(expr, envir, enclos)
13: meanerr(y)
14: test_debug_ddply.R#3: stopifnot(!is.na(m))
15: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), ch), call. = FALSE,

Anyway, maybe just including the id as an input every single time for easy debugging is just the way to go, but I was wondering if there was something more elegant that the professionals use without requiring the passing of extra variables.

Andy

avoorman · Accepted Answer

I run into this all the time with dplyr's group_by() I've had trouble using my usual options(error=recover).

I've found that wrapping the offending function in a tryCatch() does the trick:

> dsumm = ddply(d,"id",summarise,mean=tryCatch(meanerr(y),error=function(e){"error"}))
> dsumm
  id   mean
1  1    2.5
2  2  error

Debugging in plyr or dplyr - seeing which group

Tags:

r

debugging

dplyr

plyr

Andy Stein

1 Answers

avoorman

Recent Activity

Donate For Us

Debugging in plyr or dplyr - seeing which group

Tags:

r

debugging

dplyr

plyr

Andy Stein

1 Answers

avoorman

Related questions

Recent Activity

Donate For Us