Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consecutive NAs in a column

Tags:

list

r

na

I'd like to remove the rows that got more than 3 consecutive NAs in one column.

      [,1] [,2] 
[1,]    1    1   
[2,]   NA    1   
[3,]    2    4   
[4,]   NA    3   
[6,]    1    4   
[7,]   NA    8
[8,]   NA    5
[9,]   NA    6

so I'd have this data

      [,1] [,2] 
[1,]    1    1   
[2,]   NA    1   
[3,]    2    4   
[4,]   NA    3   
[6,]    1    4 

I did a research and I tried this code

data[! rowSums(is.na(data)) >3  , ]

but I think this is only used for consecutive NAs in a row.

like image 608
Marco Avatar asked May 30 '13 17:05

Marco


1 Answers

As mentioned, rle is a good place to start:

is.na.rle <- rle(is.na(data[, 1]))

Since NAs are "bad" only when they come by three or more, we can re-write the values:

is.na.rle$values <- is.na.rle$values & is.na.rle$lengths >= 3

Finally, use inverse.rle to build the vector of indices to filter:

data[!inverse.rle(is.na.rle), ]
like image 62
flodel Avatar answered Sep 23 '22 02:09

flodel