Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a variable in dplyr::filter?

Tags:

r

dplyr

I have a variable with the same name as a column in a dataframe:

df <- data.frame(a=c(1,2,3), b=c(4,5,6)) b <- 5 

I want to get the rows where df$b == b, but dplyr interprets this as df$b == df$b:

df %>% filter(b == b) # interpreted as df$b == df$b #   a b # 1 1 4 # 2 2 5 # 3 3 6 

If I change the variable name, it works:

B <- 5 df %>% filter(b == B) # interpreted as df$b == B #   a b # 1 2 5 

I'm wondering if there is a better way to tell filter that b refers to an outside variable.

like image 311
nachocab Avatar asked Dec 11 '15 09:12

nachocab


People also ask

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).

How does dplyr filter work?

The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Note that when a condition evaluates to NA the row will be dropped, unlike base subsetting with [ .

How do I filter multiple values in R dplyr?

In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.

What does filter () do in R?

The filter() method in R is used to subset a data frame based on a provided condition. If a row satisfies the condition, it must produce TRUE . Otherwise, non-satisfying rows will return NA values. Hence, the row will be dropped.


2 Answers

Recently I have found this to be an elegant solution to this problem, although I'm just starting to wrap my head around how it works.

df %>% filter(b == !!b)

which is syntactic sugar for

df %>% filter(b == UQ(b))

A high-level sense of this is that the UQ (un-quote) operation causes its contents to be evaluated before the filter operation, so that it's not evaluated within the data.frame.

This is described in this chapter of Advanced R, on 'quasi-quotation'. This chapter also includes a few solutions to similar problems related to non-standard evaluation (NSE).

like image 198
jackinovik Avatar answered Sep 29 '22 13:09

jackinovik


You could use the get function to fetch the value of the variable from the environment.

df %>% filter(b == get("b")) # Note the "" around b 
like image 38
nist Avatar answered Sep 29 '22 15:09

nist