Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating and using new variables in function in R: NSE programing error in the tidyverse

After reading and re-reading the many "programing with dplyr" guides, I still cannot find a way to solve my particular case.

I understand that the use of group_by_, mutate_ and such "string-friendly" versions of tidyverse functions is heading toward deprecation, and that enquo is the way to go.

However, my case is somewhat different, and I'm struggling to find a neat way to solve it in a tidy way.

Indeed, my aim is to create and manipulate dataframes within a function. Creating (mutating) new variables based on others, using them, etc.

However, no matter how hard I try, my code either errors or returns some warnings upon package check, such as no visible binding for global variable ....

Here's a reproducible example:

Here's what I want to do:

df <- data.frame(X=c("A", "B", "C", "D", "E"),
                 Y=c(1, 2, 3, 1, 1))
new_df <- df %>%
  group_by(Y) %>%
  summarise(N=n()) %>%
  mutate(Y=factor(Y, levels=1:5)) %>%
  complete(Y, fill=list(N = 0)) %>%
  arrange(Y) %>%
  rename(newY=Y) %>%
  mutate(Y=as.integer(newY))

Some common dplyr manipulations which expected result should be:

# A tibble: 5 x 3
     newY     N     Y
<fctr> <dbl> <int>
1      1     3     1
2      2     1     2
3      3     1     3
4      4     0     4
5      5     0     5

I would like this piece of code to quietly work inside a function. The following was my best attempt to deal with the non-NSE issues:

myfunction <- function(){
  df <- data.frame(X=c("A", "B", "C", "D", "E"),
                   Y=c(1, 2, 3, 1, 1))
  new_df <- df %>%
    group_by_("Y") %>%
    summarise(!!"N":=n()) %>%
    mutate(!!"Y":=factor(Y, levels=1:5)) %>%
    complete_("Y", fill=list(N = 0)) %>%
    arrange_("Y") %>%
    rename(!!"newY":="Y") %>%
    mutate(!!"Y":=as.integer(newY))
}

Unfortunately, I still got the following messages:

myfunction: no visible global function definition for ':='
myfunction: no visible binding for global variable 'Y'
myfunction: no visible binding for global variable 'newY'
Undefined global functions or variables:
  := Y n.Factors n_optimal newY

Is there a way to solve it? Thanks a lot!

EDIT: I'm using R 3.4.1, dplyr_0.7.4, tidyr_0.7.2 and tidyverse_1.1.1


ANSWER

Thanks to the comments I've managed to solve it, here's the working solution:

myfunction <- function(){
  df <- data.frame(X=c("A", "B", "C", "D", "E"),
                   Y=c(1, 2, 3, 1, 1))
  new_df <- df %>%
    group_by_("Y") %>%
    summarise_("N"=~n()) %>%
    mutate_("Y"= ~factor(Y, levels=1:5)) %>%
    complete_("Y", fill=list(N = 0)) %>%
    arrange_("Y") %>%
    rename_("newY"=~Y) %>%
    mutate_("Y"=~as.integer(newY))
}

Thanks A LOT :)

like image 330
Dominique Makowski Avatar asked Nov 10 '17 15:11

Dominique Makowski


Video Answer


1 Answers

The answer wasn't in the "programing with dplyr" guides because your issue is more general. Although your code deals with non-standard evaluation, your case does not need it. If you remove the code that deals with non-standard evaluation, you will reduce the number of problems you need to fix.

Still, some important issues remain -- issues of NAMESPACE. You deal with NAMESPACE anytime you use functions from other packages inside functions of your own package. NAMESPACE is not an easy topic, but if you are writing packages it will pay off to learn a bit. I recommend you to read: From r-pkgs.had.co.nz/namespace.html, find section "Imports" and read its introduction and also the subheading "R functions". That will help you understand the steps, code and comments that I post below.

Follow these steps to fix your problem:
- Add dplyr, magrittr, and tidyr to DESCRIPTION.
- Refer to functions as PACKAGE::FUNCTION().
- Remove all !! and := because in this case you don't need them.
- Import and export the pipe from magrittr.
- Import .data from rlang.
- Pass global variables to utils::globalVariables().
- Rebuild, reload, recheck.

# I make your function shorter to focus on the important details.
myfunction <- function(){
  df <- data.frame(
    X = c("A", "B", "C", "D", "E"),
    Y = c(1, 2, 3, 1, 1)
  )
   df %>%
     dplyr::group_by(.data$Y) %>%
     dplyr::summarise(N = n())
}

# Fix check() notes

#' @importFrom magrittr %>%
#' @export
magrittr::`%>%`

#' @importFrom rlang .data
NULL

utils::globalVariables(c(".data", "n"))
like image 153
Mauro Lepore Avatar answered Sep 29 '22 14:09

Mauro Lepore