Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows with all or some NAs (missing values) in data.frame

I'd like to remove the lines in this data frame that:

a) contain NAs across all columns. Below is my example data frame.

             gene hsap mmul mmus rnor cfam 1 ENSG00000208234    0   NA   NA   NA   NA 2 ENSG00000199674    0   2    2    2    2 3 ENSG00000221622    0   NA   NA   NA   NA 4 ENSG00000207604    0   NA   NA   1    2 5 ENSG00000207431    0   NA   NA   NA   NA 6 ENSG00000221312    0   1    2    3    2 

Basically, I'd like to get a data frame such as the following.

             gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0   2    2    2    2 6 ENSG00000221312    0   1    2    3    2 

b) contain NAs in only some columns, so I can also get this result:

             gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0   2    2    2    2 4 ENSG00000207604    0   NA   NA   1    2 6 ENSG00000221312    0   1    2    3    2 
like image 289
Benoit B. Avatar asked Feb 01 '11 11:02

Benoit B.


People also ask

How do I delete rows with all NAs?

Remove Rows with NA From R Dataframe. By using na. omit() , complete. cases() , rowSums() , and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame.

How do I exclude data with NA in R?

First, if we want to exclude missing values from mathematical operations use the na. rm = TRUE argument. If you do not exclude these values most functions will return an NA . We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data.


2 Answers

Also check complete.cases :

> final[complete.cases(final), ]              gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0    2    2    2    2 6 ENSG00000221312    0    1    2    3    2 

na.omit is nicer for just removing all NA's. complete.cases allows partial selection by including only certain columns of the dataframe:

> final[complete.cases(final[ , 5:6]),]              gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0    2    2    2    2 4 ENSG00000207604    0   NA   NA    1    2 6 ENSG00000221312    0    1    2    3    2 

Your solution can't work. If you insist on using is.na, then you have to do something like:

> final[rowSums(is.na(final[ , 5:6])) == 0, ]              gene hsap mmul mmus rnor cfam 2 ENSG00000199674    0    2    2    2    2 4 ENSG00000207604    0   NA   NA    1    2 6 ENSG00000221312    0    1    2    3    2 

but using complete.cases is quite a lot more clear, and faster.

like image 105
Joris Meys Avatar answered Oct 23 '22 09:10

Joris Meys


Try na.omit(your.data.frame). As for the second question, try posting it as another question (for clarity).

like image 22
Roman Luštrik Avatar answered Oct 23 '22 09:10

Roman Luštrik