I want to conditionally replace some values in a data.frame.
Suppose I have:
a <- c(1, 4, 5, 7, 9, 8, 3, 90)
b <- c(21, 24, 25, NA, 9, 23, NA, 3)
c <- c(214, 5, NA, NA, 59, NA, 32, 12)
d <- rep(0, 8)
test.df <- data.frame(a, b, c, d)
test.df
a b c d
1 1 21 214 0
2 4 24 5 0
3 5 25 NA 0
4 7 NA NA 0
5 9 9 59 0
6 8 23 NA 0
7 3 NA 32 0
8 90 3 12 0
My first question is why the below commands do not return the same? Why the second one returns lines with NAs? What is my mistake for the second?
subset(test.df, test.df$a >=4 & !is.na(test.df$b) & test.df$c > 4)
a b c d
2 4 24 5 0
5 9 9 59 0
8 90 3 12 0
test.df[test.df$a >=4 & !is.na(test.df$b) & test.df$c > 4, ]
a b c d
2 4 24 5 0
NA NA NA NA NA
5 9 9 59 0
NA.1 NA NA NA NA
8 90 3 12 0
My second question is, based on the above criteria, how do I replace column's d values with 10 in order to get:
test.df
a b c d
1 1 21 214 0
2 4 24 5 10
3 5 25 NA 0
4 7 NA NA 0
5 9 9 59 10
6 8 23 NA 0
7 3 NA 32 0
8 90 3 12 10
?
Thanks!
1) Your criterion test.df$a >=4 & !is.na(test.df$b) & test.df$c > 4 evals to:
[1] FALSE TRUE NA FALSE TRUE NA FALSE TRUE
As documented, subset will filters out the rows (3 and 6) where the criterion evals to NA. On the other hand, [ gives you a row of NAs for these as it is unsure if they should be included (TRUE) or excluded (FALSE).
2) I would use transform and an improved criterion:
test.df <- transform(test.df, d = ifelse(!is.na(a) &
!is.na(b) &
!is.na(c) &
a >= 4 &
c > 4, 10, d))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With