I've looked at other answers but cannot find a solution for the code below to work. Basically, I'm creating a function that inner_join
the two data frame and filter
based on a column inputted in the function.
The problem is that the filter
part of the function doesn't work. However it works if I take filter off the function and append it like mydiff("a") %>% filter(a.x != a.y)
Any suggestion is helpful.
Note that I am function input in quotes
library(dplyr)
# fake data
df1<- tibble(id = seq(4,19,2),
a = c("a","b","c","d","e","f","g","h"),
b = c(rep("foo",3), rep("bar",5)))
df2<- tibble(id = seq(10, 20, 1),
a = c("d","a", "e","f","k","m","g","i","h", "a", "b"),
b = c(rep("bar", 7), rep("foo",4)))
# What I am trying to do
dplyr::inner_join(df1, df2, by = "id") %>% select(id, b.x, b.y) %>% filter(b.x!=b.y)
#> # A tibble: 1 x 3
#> id b.x b.y
#> <dbl> <chr> <chr>
#> 1 18 bar foo
# creating a function so that I can filter by difference in column if I have more columns
mydiff <- function(filteron, df_1 = df1, df_2 = df2){
require(dplyr, warn.conflicts = F)
col_1 = paste0(quo_name(filteron), "x")
col_2 = paste0(quo_name(filteron), "y")
my_df<- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))
my_df %>% select(id, col_1, col_2) %>% filter(col_1 != col_2)
}
# the filter part is not working as expected.
# There is no difference whether i pipe filter or leave it out
mydiff("a")
#> # A tibble: 5 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 10 d d
#> 2 12 e e
#> 3 14 f k
#> 4 16 g g
#> 5 18 h h
The reason it did not work in your original function was that col_1
was string
but dplyr::filter()
expected "unquoted" input variable for the LHS. Thus, you need to first convert col_1
to variable using sym()
then unquote it inside filter
using !!
(bang bang).
rlang
has really nice function qq_show
to show what actually happens with quoting/unquoting (see the output below)
See also this similar question
library(rlang)
library(dplyr)
# creating a function that can take either string or symbol as input
mydiff <- function(filteron, df_1 = df1, df_2 = df2) {
col_1 <- paste0(quo_name(enquo(filteron)), "x")
col_2 <- paste0(quo_name(enquo(filteron)), "y")
my_df <- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))
cat('\nwithout sym and unquote\n')
qq_show(col_1 != col_2)
cat('\nwith sym and unquote\n')
qq_show(!!sym(col_1) != !!sym(col_2))
cat('\n')
my_df %>%
select(id, col_1, col_2) %>%
filter(!!sym(col_1) != !!sym(col_2))
}
### testing: filteron as a string
mydiff("a")
#>
#> without sym and unquote
#> col_1 != col_2
#>
#> with sym and unquote
#> ax != ay
#>
#> # A tibble: 1 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 14 f k
### testing: filteron as a symbol
mydiff(a)
#>
#> without sym and unquote
#> col_1 != col_2
#>
#> with sym and unquote
#> ax != ay
#>
#> # A tibble: 1 x 3
#> id ax ay
#> <dbl> <chr> <chr>
#> 1 14 f k
Created on 2018-09-28 by the reprex package (v0.2.1.9000)
From https://dplyr.tidyverse.org/articles/programming.html
Most dplyr functions use non-standard evaluation (NSE). This is a catch-all term that means they don't follow the usual R rules of evaluation.
This can sometimes create a few issues when attempting to wrap them in functions. Here is a base version of the function you created.
mydiff<- function(filteron, df_1=df1, df_2 = df2){
col_1 = paste0(filteron,"x")
col_2 = paste0(filteron, "y")
my_df <- merge(df1, df2, by="id", suffixes = c("x","y"))
my_df[my_df[, col_1] != my_df[, col_2], c("id", col_1, col_2)]
}
> mydiff("a")
id ax ay
3 14 f k
> mydiff("b")
id bx by
5 18 bar foo
This will solve your problem and will likely work as one expects, now and in the future. With less dependencies on outside packages, you reduce these kind of issues and other quirks which may develop in the future as the package authors evolve their work.
Seems an evaluation issue to me. Try this modified mydiff
function, using the lazyeval
package:
mydiff <- function(filteron, df_1 = df1, df_2 = df2){
require(dplyr, warn.conflicts = F)
col_1 <- paste0(quo_name(filteron), "x")
col_2 <- paste0(quo_name(filteron), "y")
criteria <- lazyeval::interp(~ x != y, .values = list(x = as.name(col_1), y = as.name(col_2)))
my_df <- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))
my_df %>% select(id, col_1, col_2) %>% filter_(criteria)
}
You can take a look at the Functions chapter from Hadley Wickham’s book Advanced R for more on that.
The advice of using base R for simple functions is good, however it does not scale to more complex tidyverse functions and you lose the portability to dplyr backends like databases. If you want to create functions around tidyverse pipelines, you'll have to learn a bit about R expressions and the unquoting operator !!
. I recommend skimming over the first sections of https://tidyeval.tidyverse.org to get a rough idea of the concepts used here.
Since the function you'd like to create takes a bare column name and does not involve complex expressions (like you would pass to mutate()
or summarise()
), we don't need fancy stuff like quosures. We can work with symbols. To create a symbol, use as.name()
or rlang::sym()
.
as.name("mycolumn")
#> mycolumn
rlang::sym("mycolumn")
#> mycolumn
The latter has the advantage of being part of a larger family of functions: ensym()
, and the plural variants syms()
and ensyms()
. We are going to use ensym()
to capture a column name, i.e. delay the execution of the column in order to pass it to dplyr after a few transformations. Delaying the execution is called "quoting".
I have made a few changes to the interface of your function:
Take the data frames first for consistency with dplyr functions
Don't provide defaults for the data frames. These defaults are making too many assumptions.
Make by
and suffix
user-configurable, with reasonable defaults.
Here is the code, with explanations inline:
mydiff <- function(df1, df2, var, by = "id", suffix = c(".x", ".y")) {
stopifnot(is.character(suffix), length(suffix) == 2)
# Let's start by the easy task, joining the data frames
df <- dplyr::inner_join(df1, df2, by = by, suffix = suffix)
# Now onto dealing with the diff variable. `ensym()` takes a column
# name and delays its execution:
var <- rlang::ensym(var)
# A delayed column name is not a string, it's a symbol. So we need
# to transform it to a string in order to work with paste() etc.
# `quo_name()` works in this case but is generally only for
# providing default names.
#
# Better use base::as.character() or rlang::as_string() (the latter
# works a bit better on Windows with foreign UTF-8 characters):
var_string <- rlang::as_string(var)
# Now let's add the suffix to the name:
col1_string <- paste0(var_string, suffix[[1]])
col2_string <- paste0(var_string, suffix[[2]])
# dplyr::select() supports column names as strings but it is an
# exception in the dplyr API. Generally, dplyr functions take bare
# column names, i.e. symbols. So let's transform the strings back to
# symbols:
col1 <- rlang::sym(col1_string)
col2 <- rlang::sym(col2_string)
# The delayed column names now need to be inserted back into the
# dplyr code. This is accomplished by unquoting with the !!
# operator:
df %>%
dplyr::select(id, !!col1, !!col2) %>%
dplyr::filter(!!col1 != !!col2)
}
mydiff(df1, df2, b)
#> # A tibble: 1 x 3
#> id b.x b.y
#> <dbl> <chr> <chr>
#> 1 18 bar foo
mydiff(df1, df2, "a")
#> # A tibble: 1 x 3
#> id a.x a.y
#> <dbl> <chr> <chr>
#> 1 14 f k
You can also simplify the function by taking strings instead of bare column names. In this version, I'll use syms()
to create a list of symbols, and !!!
to pass it all at once to select()
:
mydiff2 <- function(df1, df2, var, by = "id", suffix = c(".x", ".y")) {
stopifnot(
is.character(suffix), length(suffix) == 2,
is.character(var), length(var) == 1
)
# Create a list of symbols from a character vector:
cols <- rlang::syms(paste0(var, suffix))
df <- dplyr::inner_join(df1, df2, by = by, suffix = suffix)
# Unquote the whole list as once with the big bang !!!
df %>%
dplyr::select(id, !!!cols) %>%
dplyr::filter(!!cols[[1]] != !!cols[[2]])
}
mydiff2(df1, df2, "a")
#> # A tibble: 1 x 3
#> id a.x a.y
#> <dbl> <chr> <chr>
#> 1 14 f k
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With