Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't dplyr filter() work within function (i.e. using variable for column name)?

Tags:

r

filter

dplyr

A function for filtering, grouping and mutating data with dplyr functions. Basic pipe sequence works great outside a function, that is where I use the true column names. Put it in a function where the column name is a variable and some of the functions work but some don't most notably dplyr::filter(). For example:

var1 <- c('yes', NA, NA, 'yes', 'yes', NA, NA, NA, 'yes', NA, 'no', 'no', 'no', 'maybe', NA, 'maybe', 'maybe', 'maybe')

var2 <- c(1:18)

df <- data.frame(var1, var2)

This works fine (i.e. filters NA's):

df%>%filter(!is.na(var1))

...but this doesn't:

x <- "var1"

df%>%filter(!is.na(x))

...but this does:

df%>%select(x)

It's NA's that need to be filtered out specifically.

Tried get("x"), no good, and slicing:

df[!is.na(x),]

...no good, either.

Any ideas on how to pass a variable to filter inside (or outside) a function and why a variable is working with other dplyr functions?

like image 723
Conner M. Avatar asked Jul 23 '17 03:07

Conner M.


People also ask

Can you use dplyr in a function?

dplyr functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like select(mtcars, mpg) , and why select(mtcars, "mpg") doesn't work. When you use dplyr in functions, you will likely want to use "standard evaluation".

Can you filter columns in R?

Column values can be subjected to constraints to filter and subset the data. The values can be mapped to specific occurrences or within a range.

What is filter function in dplyr?

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .


1 Answers

We can use the sym to convert to a symbol and then with UQ evaluate it

library(rlang)
library(dplyr)
df %>%
   filter(!is.na(UQ(sym(x))))
#     var1 var2
#1    yes    1
#2    yes    4
#3    yes    5
#4    yes    9
#5     no   11
#6     no   12
#7     no   13
#8  maybe   14
#9  maybe   16
#10 maybe   17
#11 maybe   18
like image 84
akrun Avatar answered Nov 14 '22 21:11

akrun