Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If variable has same name as dataframe's column, dplyr filter's not working

I have a simple loop that iterate across a number of string values in a vector called measurements:

measurements <- c("A","B","C","D")

here a reproducible data frame:

value <- c(1,2,3,4)
measurement <- c("A","B","C","D")
questiondata <- data.frame(measurement, value)
questiondata <- as.tibble(questiondata)

At first, the loop filters rows based on the measurement column. If the variable assigned in the loop has the same name as the column name of my data frame the filter does not work, it prints the entire dataframe 4 times:

for (measurement in measurements){
  print(measurement)
  print(questiondata %>% dplyr::filter(measurement == measurement))
}

If, instead,I change the variable name - from "measurement" to "m" for instance- it works:

for (m in measurements){
  print(m)
  print(questiondata %>% dplyr::filter(measurement == m))
}

Does anyone know the reason of this behaviour?

like image 435
spleen Avatar asked Sep 12 '25 15:09

spleen


1 Answers

This issue results from the ambiguity between data-variables and env-variables for data-masked functions like filter().

In the following code, the both measurement refer to the measurement column from the questiondata data, and hence there are no rows being filtered out.

questiondata %>% filter(measurement == measurement)

# # A tibble: 4 × 2
#   measurement value
#   <chr>       <dbl>
# 1 A               1
# 2 B               2
# 3 C               3
# 4 D               4

You could use the .env pronoun to make it explicit where to find objects.

questiondata %>% filter(measurement == .env$measurement)

# # A tibble: 1 × 2
#   measurement value
#   <chr>       <dbl>
# 1 D               4
like image 91
Darren Tsai Avatar answered Sep 14 '25 07:09

Darren Tsai