I've got a dataframe that contains some NAs and when I index into it I would like R to ignore the NAs in that column.
fake = data.frame(id = 1:5,
color = c('red', NA, NA, 'blue', 'blue'),
value = rnorm(5))
sub = fake[fake$color != 'red', ]
id color value
NA NA <NA> NA
NA.1 NA <NA> NA
4 4 blue -0.3227421
5 5 blue -1.0196561
The dataframe I want back is:
id color value
2 2 <NA> 0.2761862
3 3 <NA> 1.0029380
4 4 blue -0.3227421
5 5 blue -1.0196561
But for whatever reason, R NAs out the entire row when an NA in 'color' is encountered. I've tooled around with 'na.exclude,' 'na.pass,' etc., but haven't found a clean way to do this.
fake[!fake$color %in% "red",]
# id color value
# 2 2 <NA> -1.1341590
# 3 3 <NA> -0.6181337
# 4 4 blue 0.6115878
# 5 5 blue 1.3984797
Perhaps it is better to use setdiff
in this case:
fake[setdiff(rownames(fake), which(fake$color == "red")), ]
# id color value
# 2 2 <NA> 1.015132
# 3 3 <NA> -1.425210
# 4 4 blue 1.089207
# 5 5 blue 1.442323
You are getting tripped up by !=
returning NA rather than TRUE. This should succeed:
sub = fake[ is.na(fake$color) | fake$color != 'red', ]
Nothing equals ==
, NA and furthermore nothing is not-equal !=
, to NA, not even NA. Notice:
> is.na(fake$color) | fake$color != 'red'
[1] FALSE TRUE TRUE TRUE TRUE
> NA == NA
[1] NA
But the NA's can give you what you want when combinied using OR ,|
> NA | TRUE
[1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With