Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

defining custom dplyr methods in R package

I have a package with custom summary(), print() methods for objects that have a particular class. This package also uses the wonderful dplyr package for data manipulation - and I expect my users to write scripts that use both my package and dplyr.

One roadblock, which has been noted by others here and here is that dplyr verbs doesn't preserve custom classes - meaning that an ungroup command can strip my data.frames of their custom classes, and thus screw up method dispatch for summary, etc.

Hadley says "doing this correctly is up to you - you need to define a method for your class for each dplyr method that correctly restores all the classes and attributes" and I'm trying to take the advice - but I can't figure out how to correctly wrap the dplyr verbs.

Here's a simple toy example. Let's say I've defined a cars class, and I have a custom summary for it.

this works

library(tidyverse)

class(mtcars) <- c('cars', class(mtcars))

summary.cars <- function(x, ...) {
  #gather some summary stats
  df_dim <- dim(x)
  quantile_sum <- map(mtcars, quantile)
  
  cat("A cars object with:\n")
  cat(df_dim[[1]], 'rows and ', df_dim[[2]], 'columns.\n')
  
  print(quantile_sum)

}

summary(mtcars)

here's the problem

small_cars <- mtcars %>% filter(cyl < 6)
summary(small_cars)
class(small_cars)

that summary call for small_cars just gives me the generic summary, not my custom method, because small_cars no longer retains the cars class after dplyr filtering.

what I tried

First I tried writing a custom method around filter (filter.cars). That didn't work, because filter actually a wrapper around filter_ that allows for non-standard evaluation.

So I wrote a custom filter_ method for cars objects, attempting to implement @jwdink 's advice

filter_.cars <- function(df, ...) {
  
  old_classes <- class(df)
  out <- dplyr::filter_(df, ...)
  new_classes <- class(out)
  
  class(out) <- c(new_classes, old_classes) %>% unique()
  
  out
}

That doesn't work - I get an infinite recursion error:

Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?

All I want to do is grab the classes on the incoming df, hand off to dplyr, then return the object with the same classnames as it had before the dplyr call. How do I change my filter_ wrapper to accomplish that? Thanks!

like image 675
Andrew Avatar asked Jan 31 '17 21:01

Andrew


1 Answers

UPDATE:

Some things have changed since my original answer:

  • Many dplyr verbs no longer remove custom classes; for example, dplyr::filter keeps the class. However, some — like dplyr::group_by — still remove the class, so this question lives on.
  • With R 3.5 and beyond, method lookup changed its scoping rules
  • The trailing-underscore version of the verbs are deprecated

Recently ran into a hard-to-figure-out issues due to the second bullet, so just wanted to give a fuller example. Let's say you're using a custom class, with name custom_class, and you want to add a groupby method. Assuming you're using roxygen:

#' group_by.custom_class
#' 
#' @description Preserve the class of a `custom_class` object.
#' @inheritParams dplyr::group_by
#'
#' @importFrom dplyr group_by
#'
#' @export
#' @method group_by custom_class
group_by.custom_class <- function(.data, ...) {
  result <- NextMethod()
  return(reclass(.data, result))
}

(see original answer for definition of reclass function)

Highlights:

  • You need @method group_by custom_class to add S3method(group_by,custom_class) to NAMESPACE
  • You need @importFrom dplyr group_by to add importFrom(dplyr,group_by) to your NAMESPACE

I believe in R < 3.5 you could get away with just that second one, but now you need both.


OLD ANSWER:

Further suggestions were offered in the thread so I thought I'd update with what seems to be best practice, which is to use NextMethod().

filter_.cars <- function(.data, ...) {
   result <- NextMethod()
   reclass(.data, result)
}

Where reclass is written by you; it's just a generic that (at least) adds the original class back on:

reclass <- function(x, result) {
  UseMethod('reclass')
}

reclass.default <- function(x, result) {
  class(result) <- unique(c(class(x)[[1]], class(result)))
  result
}
like image 111
jwdink Avatar answered Sep 21 '22 15:09

jwdink