A function for filtering, grouping and mutating data with dplyr functions. Basic pipe sequence works great outside a function, that is where I use the true column names. Put it in a function where the column name is a variable and some of the functions work but some don't most notably dplyr::filter(). For example:
var1 <- c('yes', NA, NA, 'yes', 'yes', NA, NA, NA, 'yes', NA, 'no', 'no', 'no', 'maybe', NA, 'maybe', 'maybe', 'maybe')
var2 <- c(1:18)
df <- data.frame(var1, var2)
This works fine (i.e. filters NA's):
df%>%filter(!is.na(var1))
...but this doesn't:
x <- "var1"
df%>%filter(!is.na(x))
...but this does:
df%>%select(x)
It's NA's that need to be filtered out specifically.
Tried get("x"), no good, and slicing:
df[!is.na(x),]
...no good, either.
Any ideas on how to pass a variable to filter inside (or outside) a function and why a variable is working with other dplyr functions?
dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".
Column values can be subjected to constraints to filter and subset the data. The values can be mapped to specific occurrences or within a range.
The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .
We can use the sym
to convert to a symbol and then with UQ
evaluate it
library(rlang)
library(dplyr)
df %>%
filter(!is.na(UQ(sym(x))))
# var1 var2
#1 yes 1
#2 yes 4
#3 yes 5
#4 yes 9
#5 no 11
#6 no 12
#7 no 13
#8 maybe 14
#9 maybe 16
#10 maybe 17
#11 maybe 18
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With