Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use namespaced function with dplyr::mutate_each?

I am trying to use dplyr::mutate_each with some external functions without attaching actual libraries

dplyr::tbl_df(iris) %>% 
    dplyr::mutate_each(dplyr::funs(stringi::stri_trim_both))

but it fails with following error:

Error: unsupported type for column 'Sepal.Length' (CLOSXP, classes = function)

When I use data.table instead of data.frame:

Error in `[.data.table`(`_dt`, , `:=`(Sepal.Length, stringi::stri_trim_both), : RHS of assignment is not NULL, not an an atomic vector (see ?is.atomic) and not a list column.

If I use local variable as below everything works as expected.

trim_both <-  stringi::stri_trim_both
dplyr::tbl_df(iris) %>% dplyr::mutate_each(dplyr::funs(trim_both))

It is not an optimal solution but I can live with that. Nevertheless I would be grateful for an explanation what is the source of the problem.

Session info:

R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.4.1

loaded via a namespace (and not attached):
[1] assertthat_0.1       DBI_0.3.1            lazyeval_0.1.10.9000
[4] magrittr_1.5         parallel_3.1.1       Rcpp_0.11.4         
[7] stringi_0.4-1        tools_3.1.1         

Note: This problem no longer occurs in dplyr 0.7.2.

like image 662
zero323 Avatar asked Feb 22 '15 02:02

zero323


People also ask

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

What is the difference between Tidyr and dplyr?

dplyr is a package for making tabular data manipulation easier. tidyr enables you to swiftly convert between different data formats.

What dplyr function do you use to pick observations by their values?

6.4 dplyr basicsfilter() : pick observations by their values. select() : pick variables by their names. mutate() : create new variables with functions of existing variables. summarise() : collapse many values down to a single summary.


1 Answers

The underlying reason is that dplyr::funs_ calls dplyr:::make_call. And dplyr:::make_call differentiates between cases using the class of the object generated by lazyeval::lazy_dots.

class(lazyeval::lazy_dots(trim_both)[[1]]$expr)
## "name"
class(lazyeval::lazy_dots(stringi::stri_trim_both)[[1]]$expr)
## "call"

See the function my_funs below for a solution to this. I have not tested this in any detail and I am sure that there is a reason that this was different in dplyr, so do not use this as a default. It's mostly meant to clarify the problem

# calling my_funs_ (instead of funs_)
my_funs <- function (...) 
  my_funs_(lazyeval::lazy_dots(...))

my_funs_ <- function(dots){
  dots <- lazyeval::as.lazy_dots(dots)
  env <- lazyeval::common_env(dots)
  names(dots) <- dplyr:::names2(dots)
  # difference here
  dots[] <- lapply(dots, function(x) {
    if (is.character(x$expr)) {
      x$expr <- substitute(f(.), list(f = as.name(x$expr)))
    }
    else if (is.name(x$expr)) {
      x$expr <- substitute(f(.), list(f = x$expr))
    }
    else if (is.call(x$expr)) {
      x$expr <- substitute(f(.), list(f = x$expr)) #### this line was different
      # originally x$expr <- x$expr
    }
    else {
      stop("Unknown inputs")
    }
    x
  })
  missing_names <- names(dots) == ""
  ### this is also different 
  default_names <- vapply(dots[missing_names], function(x) as.character(x)[1], 
                          character(1))
  ## originally dplyr:::make_name(x) instead of as.character(x)[1]
  names(dots)[missing_names] <- default_names
  class(dots) <- c("fun_list", "lazy_dots")
  dots
}

dplyr::tbl_df(iris) %>% 
  dplyr::mutate_each(my_funs(stringi::stri_trim_both))
like image 193
shadow Avatar answered Nov 11 '22 09:11

shadow