Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error when using dplyr inside of a function

I'm trying to put together a function that creates a subset from my original data frame, and then uses dplyr's SELECT and MUTATE to give me the number of large/small entries, based on the sum of the width and length of sepals/petals.

filter <- function (spp, LENGTH, WIDTH) {
  d <- subset (iris, subset=iris$Species == spp) # This part seems to work just fine
  large <- d %>%                       
    select (LENGTH, WIDTH) %>%   # This is where the problem arises.
    mutate (sum = LENGTH + WIDTH) 
  big_samples <- which(large$sum > 4)
 return (length(big_samples)) 
}

Basically, I want the function to return the number of large flowers. However, when I run the function I get the following error -

filter("virginica", "Sepal.Length", "Sepal.Width")

 Error: All select() inputs must resolve to integer column positions.
The following do not:
*  LENGTH
*  WIDTH 

What am I doing wrong?

like image 877
ari8888 Avatar asked Dec 09 '15 19:12

ari8888


People also ask

Can you use dplyr in a function?

As with any R function, you can think of functions in the dplyr package as verbs - that refer to performing a particular action on a data frame. The core dplyr functions are: rename() renames columns. filter() filters rows based on their values in specified columns.

Does dplyr work with data frame?

All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr.

What is dplyr used for in R?

The dplyr package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.

Which function in dplyr package can be used to change the name of a variable in a data frame?

12.8 rename() Renaming a variable in a data frame in R is surprisingly hard to do! The rename() function is designed to make this process easier.


2 Answers

You are running into NSE/SE problems, see the vignette for more info.

Briefly, dplyr uses a non standard evaluation (NSE) of names, and passing names of columns into functions breaks it, without using the standard evaluation (SE) version.

The SE versions of the dplyr functions end in _. You can see that select_ works nicely with your original arguments.

However, things get more complicated when using functions. We can use lazyeval::interp to convert most function arguments into column names, see the conversion of the mutate to mutate_ call in your function below and more generally, the help: ?lazyeval::interp

Try:

filter <- function (spp, LENGTH, WIDTH) {
    d <- subset (iris, subset=iris$Species == spp) 
    large <- d %>%                       
        select_(LENGTH, WIDTH) %>%  
        mutate_(sum = lazyeval::interp(~X + Y, X = as.name(LENGTH), Y = as.name(WIDTH))) 
    big_samples <- which(large$sum > 4)
    return (length(big_samples)) 
}
like image 179
jeremycg Avatar answered Sep 22 '22 02:09

jeremycg


UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.

See http://dplyr.tidyverse.org/articles/programming.html for more details.

filter_big <- function(spp, LENGTH, WIDTH) {
  LENGTH <- enquo(LENGTH)                    # Create quosure
  WIDTH  <- enquo(WIDTH)                     # Create quosure

  iris %>% 
    filter(Species == spp) %>% 
    select(!!LENGTH, !!WIDTH) %>%            # Use !! to unquote the quosure
    mutate(sum = (!!LENGTH) + (!!WIDTH)) %>% # Use !! to unquote the quosure
    filter(sum > 4) %>% 
    nrow()
}

filter_big("virginica", Sepal.Length, Sepal.Width)

> filter_big("virginica", Sepal.Length, Sepal.Width)
[1] 50
like image 44
Brad Cannell Avatar answered Sep 21 '22 02:09

Brad Cannell