Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function to retain rows with >= 1 NA value (opposite of na.omit)

Tags:

r

Is there a function that keeps rows with at least one NA, discards rows if there are no NAs...the opposite of na.omit()? I tried !na.omit() but that didn't work.

like image 557
SFun28 Avatar asked Aug 02 '11 15:08

SFun28


2 Answers

Use the negation of complete.cases, i.e. !complete.cases(x)

Adapted from ?complete.cases:

data(airquality)
head(airquality[!complete.cases(airquality), ])

   Ozone Solar.R Wind Temp Month Day
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
10    NA     194  8.6   69     5  10
11     7      NA  6.9   74     5  11
25    NA      66 16.6   57     5  25
26    NA     266 14.9   58     5  26
like image 144
Andrie Avatar answered Oct 16 '22 17:10

Andrie


Here is one approach using dummy data for a matrix, but it can be adapted to the data frame case easily enough.

mat <- matrix(runif(100), ncol = 10)
set.seed(2)
mat[sample(100, 10)] <- NA

We can use is.na() to convert the matrix to a logical one on basis of NA presence:

> is.na(mat)
       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
 [1,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
 [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [3,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [4,] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [7,] FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
 [9,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[10,] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE

From that we can apply() the any() function to the rows, to return a logical index for rows with one or more NAs:

> apply(is.na(mat), 1, any)
 [1]  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

Putting this together we have:

ind <- apply(is.na(mat), 1, any)
mat[ind, ]

giving:

> ind <- apply(is.na(mat), 1, any)
> mat[ind, ]
          [,1]       [,2]      [,3]        [,4]       [,5]       [,6]
[1,] 0.6618988 0.01041453 0.9817279 0.007109038 0.77002786         NA
[2,] 0.8368892         NA 0.1150841 0.683403423 0.62512173 0.04217553
[3,] 0.1505014 0.86886104 0.1632009 0.929720222         NA 0.18467346
[4,] 0.1492469         NA 0.9746879 0.785878913 0.38814476         NA
[5,] 0.3570626 0.28487057 0.3490884 0.988902156 0.46150111 0.86784466
[6,] 0.9626440         NA 0.5019699 0.613952910 0.21867519 0.40264274
[7,] 0.1323720 0.15046975 0.8103973 0.710185730 0.06593551 0.57268500
           [,7]      [,8]      [,9]     [,10]
[1,] 0.35064257 0.9767552 0.2009347        NA
[2,] 0.02505036 0.3799989 0.9806000 0.3733586
[3,] 0.40110104 0.5603876 0.8289221 0.5743769
[4,] 0.97151543 0.4269434 0.8989719 0.8726963
[5,] 0.32372244        NA 0.4533770 0.1105549
[6,] 0.73319143 0.1153091 0.1474178 0.9527002
[7,]         NA 0.4400317        NA 0.5690021
like image 37
Gavin Simpson Avatar answered Oct 16 '22 15:10

Gavin Simpson