Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieving subset of a data frame by finding entries with NA in specific columns

Tags:

r

subset

Suppose we had a data frame with NA values like so,

>data
A  B  C  D
1  3  NA 4
2  1  3  4
NA 3  3  5
4  2  NA NA
2  NA 4  3
1  1  1  2

I wish to know a general method for retrieving the subset of data with NA values in C or A. So the output should be,

A  B  C  D
1  3  NA 4
NA 3  3  5
4  2  NA NA

I tried using the subset command like so, subset(data, A==NA | C==NA), but it didn't work. Any ideas?

like image 564
Christian Bueno Avatar asked Nov 30 '22 12:11

Christian Bueno


1 Answers

A very handy function for these sort of things is complete.cases. It checks row-wise for NA and if any returns FALSE. If there are no NAs, returns TRUE.

So, you need to subset just the two columns of your data and then use complete.cases(.) and negate it and subset those rows back from your original data, as follows:

# assuming your data is in 'df'
df[!complete.cases(df[, c("A", "C")]), ]
#    A B  C  D
# 1  1 3 NA  4
# 3 NA 3  3  5
# 4  4 2 NA NA
like image 109
Arun Avatar answered Dec 10 '22 09:12

Arun