I'm using dplyr
to replace the value
with NA
if a condition is met, but it's putting NA
in place where it shouldn't be.
dput:
df <- structure(list(id = c("USC00231275", "USC00231275", "USC00231275",
"USC00231275", "USC00231275", "USC00231275", "USC00231275", "USC00231275",
"USC00231275", "USC00231275"), element = c("TMAX", "TMIN", "TMAX",
"TMIN", "TMAX", "TMIN", "TMAX", "TMIN", "TMAX", "TMIN"), year = c(1937,
1937, 1937, 1937, 1937, 1937, 1937, 1937, 1937, 1937), month = c(5,
5, 5, 5, 5, 5, 5, 5, 5, 5), day = c(1, 1, 2, 2, 3, 3, 4, 4, 5,
5), date = structure(c(-11933, -11933, -11932, -11932, -11931,
-11931, -11930, -11930, -11929, -11929), class = "Date"), value = c(0,
53.96, 68, 44.96, 62.06, 53.96, 73.04, 53.96, 69.08, 50)), .Names = c("id",
"element", "year", "month", "day", "date", "value"), row.names = c(NA,
10L), class = "data.frame")
data.frame
(Note: condition is only met on row 1 and 2)
id element year month day date value
1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00
2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96
3 USC00231275 TMAX 1937 5 2 1937-05-02 68.00
4 USC00231275 TMIN 1937 5 2 1937-05-02 44.96
5 USC00231275 TMAX 1937 5 3 1937-05-03 62.06
6 USC00231275 TMIN 1937 5 3 1937-05-03 53.96
7 USC00231275 TMAX 1937 5 4 1937-05-04 73.04
8 USC00231275 TMIN 1937 5 4 1937-05-04 53.96
9 USC00231275 TMAX 1937 5 5 1937-05-05 69.08
10 USC00231275 TMIN 1937 5 5 1937-05-05 50.00
dplyr
df %>%
group_by(date) %>%
mutate(
value = if(value[element == 'TMIN'] >= value[element == 'TMAX'])
as.numeric(NA) else value
)
id element year month day date value
(chr) (chr) (dbl) (dbl) (dbl) (date) (dbl)
1 USC00231275 TMAX 1937 5 1 1937-05-01 NA
2 USC00231275 TMIN 1937 5 1 1937-05-01 NA
3 USC00231275 TMAX 1937 5 2 1937-05-02 68.00
4 USC00231275 TMIN 1937 5 2 1937-05-02 44.96
5 USC00231275 TMAX 1937 5 3 1937-05-03 NA
6 USC00231275 TMIN 1937 5 3 1937-05-03 NA
7 USC00231275 TMAX 1937 5 4 1937-05-04 73.04
8 USC00231275 TMIN 1937 5 4 1937-05-04 53.96
9 USC00231275 TMAX 1937 5 5 1937-05-05 69.08
10 USC00231275 TMIN 1937 5 5 1937-05-05 50.00
Notice that the only rows that should change are 1
and 2
, but dplyr
changed rows 5
and 6
even though the conditions were not met.
The code below should do what you are trying to do
df %>%
group_by(date) %>%
mutate(new_value = ifelse( ( (value[element == 'TMIN'] >= value[element == 'TMAX']) & element=='TMIN'), NA, value)) %>%
ungroup
For the question of whether this is a bug or not, I don't think it is. Looking at just the data for the one year, where TMIN >= TMAX, you have the following
df %>%
filter(date == '1937-05-01') %>%
mutate(res = (value[element == 'TMIN'] >= value[element == 'TMAX'])) %>%
mutate(new_value = ifelse( (res & element=='TMIN'), NA, value))
id element year month day date value res new_value
1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00 TRUE 0
2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96 TRUE NA
The construct value[element == 'TMIN'] >= value[element == 'TMAX'])
will always be true as can be seen in the res
column. The code below breaks this down a bit to hopefully clarify (I hope).
### Just looking at one date
> df2 <- df %>% filter(date == '1937-05-01')
> df2
id element year month day date value
1 USC00231275 TMAX 1937 5 1 1937-05-01 0.00
2 USC00231275 TMIN 1937 5 1 1937-05-01 53.96
### This comparison will be recycled for every element in the group,
### so it will always be TRUE or always FALSE.
> c(df2$value[df2$element == 'TMIN'], df2$value[df2$element == 'TMAX'])
[1] 53.96 0.00
Since there is one comparison for the entire group, they will always see TRUE or always FALSE.
The code that gives the correct result shows how the comparison can be gotten around.
One possible final solution could be:
df %>%
group_by(date) %>%
mutate(value = ifelse( ( (value[element == 'TMIN'] >= value[element == 'TMAX']) & element=='TMIN'), NA, value)) %>%
ungroup
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With