Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to induce R to "ignore" NAs when indexing into a dataframe?

Tags:

r

I've got a dataframe that contains some NAs and when I index into it I would like R to ignore the NAs in that column.

fake = data.frame(id = 1:5,
                  color = c('red', NA, NA, 'blue', 'blue'),
                  value = rnorm(5))

sub = fake[fake$color != 'red', ]

     id color      value
NA   NA  <NA>         NA
NA.1 NA  <NA>         NA
4     4  blue -0.3227421
5     5  blue -1.0196561

The dataframe I want back is:

  id color      value
2  2  <NA>  0.2761862
3  3  <NA>  1.0029380
4  4  blue -0.3227421
5  5  blue -1.0196561

But for whatever reason, R NAs out the entire row when an NA in 'color' is encountered. I've tooled around with 'na.exclude,' 'na.pass,' etc., but haven't found a clean way to do this.

like image 601
Erin Shellman Avatar asked Dec 11 '13 18:12

Erin Shellman


3 Answers

fake[!fake$color %in% "red",]
#   id color      value
# 2  2  <NA> -1.1341590
# 3  3  <NA> -0.6181337
# 4  4  blue  0.6115878
# 5  5  blue  1.3984797
like image 171
Josh O'Brien Avatar answered Nov 17 '22 18:11

Josh O'Brien


Perhaps it is better to use setdiff in this case:

fake[setdiff(rownames(fake), which(fake$color == "red")), ]
#   id color     value
# 2  2  <NA>  1.015132
# 3  3  <NA> -1.425210
# 4  4  blue  1.089207
# 5  5  blue  1.442323
like image 34
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 17 '22 18:11

A5C1D2H2I1M1N2O1R2T1


You are getting tripped up by != returning NA rather than TRUE. This should succeed:

  sub = fake[ is.na(fake$color) | fake$color != 'red', ]

Nothing equals ==, NA and furthermore nothing is not-equal !=, to NA, not even NA. Notice:

> is.na(fake$color) | fake$color != 'red'
[1] FALSE  TRUE  TRUE  TRUE  TRUE

> NA == NA
[1] NA

But the NA's can give you what you want when combinied using OR ,|

>  NA | TRUE
[1] TRUE
like image 2
IRTFM Avatar answered Nov 17 '22 17:11

IRTFM