I am trying to perform an inner join two tables using dplyr, and I think I'm getting tripped up by non-standard evaluation rules. When using the by=("a" = "b") argument, everything works as expected when "a" and "b" are actual strings. Here's a toy example that works:
library(dplyr) data(iris) inner_join(iris, iris, by=c("Sepal.Length" = "Sepal.Width"))
But let's say I was putting inner_join in a function:
library(dplyr) data(iris) myfn <- function(xname, yname) { data(iris) inner_join(iris, iris, by=c(xname = yname)) } myfn("Sepal.Length", "Sepal.Width")
This returns the following error:
Error: cannot join on columns 'xname' x 'Sepal.Width': index out of bounds
I suspect there is some fancy expression, deparsing, quoting, or unquoting that I could do to make this work, but I'm a bit murky on those details.
You can use
myfn <- function(xname, yname) { data(iris) inner_join(iris, iris, by=setNames(yname, xname)) }
The suggested syntax in the ?inner_join
documentation of
by = c("a"="b") # same as by = c(a="b")
is slightly misleading because both those values aren't proper character values. You're actually created a named character vector. To dynamically set the values to the left of the equals sign is different from those on the right. You can use setNames()
to set the names of the vector dynamically.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With