I have a simple loop that iterate across a number of string values in a vector called measurements:
measurements <- c("A","B","C","D")
here a reproducible data frame:
value <- c(1,2,3,4)
measurement <- c("A","B","C","D")
questiondata <- data.frame(measurement, value)
questiondata <- as.tibble(questiondata)
At first, the loop filters rows based on the measurement column. If the variable assigned in the loop has the same name as the column name of my data frame the filter does not work, it prints the entire dataframe 4 times:
for (measurement in measurements){
print(measurement)
print(questiondata %>% dplyr::filter(measurement == measurement))
}
If, instead,I change the variable name - from "measurement" to "m" for instance- it works:
for (m in measurements){
print(m)
print(questiondata %>% dplyr::filter(measurement == m))
}
Does anyone know the reason of this behaviour?
This issue results from the ambiguity between data-variables and env-variables for data-masked functions like filter()
.
In the following code, the both measurement
refer to the measurement
column from the questiondata
data, and hence there are no rows being filtered out.
questiondata %>% filter(measurement == measurement)
# # A tibble: 4 × 2
# measurement value
# <chr> <dbl>
# 1 A 1
# 2 B 2
# 3 C 3
# 4 D 4
You could use the .env
pronoun to make it explicit where to find objects.
questiondata %>% filter(measurement == .env$measurement)
# # A tibble: 1 × 2
# measurement value
# <chr> <dbl>
# 1 D 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With