Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove a row which contain only missing values in R?

Tags:

r

I have a large data set with 11 columns and 100000 rows (for example) in which i have values 1,2,3,4. Where 4 is a missing value. Some of the rows are completely missing. i.e. 4 in all 11 columns. For example

"4"  "4"  "4"  "4"  "4"  "4"  "4"  "4"  "4"  "4"   "4"

Now what i need is to remove only those rows which are completely missing. In simple words, i want to keep rows with missing value less than 11. I have used na.omit, but it does not work in my case.

Thanks in advance.

like image 660
Iftikhar Avatar asked Aug 25 '11 04:08

Iftikhar


People also ask

How do I remove rows with missing values in R?

By using na. omit() , complete. cases() , rowSums() , and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame.

How do I remove missing data in R?

Firstly, we use brackets with complete. cases() function to exclude missing values in R. Secondly, we omit missing values with na. omit() function.


2 Answers

Perhaps your best option is to utilise R's idiom for working with missing, or NA values. Once you have coded NA values you can work with complete.cases to easily achieve your objective.

Create some sample data with missing values (i.e. with value 4):

set.seed(123)
m <- matrix(sample(1:4, 30, prob=c(0.3, 0.3, 0.3, 0.1), replace=TRUE), ncol=6)
m[4, ] <- rep(4, 6)

Replace all values equal to 4 with NA:

m[m==4] <- NA
m
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1   NA    2    2    2
[2,]    2    3    3    1    2    3
[3,]    3    2    2    1    2    3
[4,]   NA   NA   NA   NA   NA   NA
[5,]   NA    3    1   NA    2    1

Now you can use a variety of functions that deal with NA values. For example, complete.cases will return only, you guessed it, complete cases:

m[complete.cases(m), ]

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    2    3    3    1    2    3
[2,]    3    2    2    1    2    3

For more information, see ?complete.cases or ?na.fail in the stats package.

like image 96
Andrie Avatar answered Oct 17 '22 11:10

Andrie


I found this solution elsewhere and am pasting it here using Andrie's code to generate the initial data set.

First generate the data set:

set.seed(123)
m <- matrix(sample(1:4, 30, prob=c(0.3, 0.3, 0.3, 0.1), replace=TRUE), ncol=6)
m[4, ] <- rep(4, 6)
m[m==4] <- NA
m

Here is the intial data set:

1    1    NA   2    2    2
2    3    3    1    2    3
3    2    2    1    2    3
NA   NA   NA   NA   NA   NA
NA   3    1    NA   2    1

Now remove rows that only contain missing observations:

m[rowSums(is.na(m))<ncol(m),] 

Here is the result:

1    1    NA   2    2    2
2    3    3    1    2    3
3    2    2    1    2    3
NA   3    1    NA   2    1
like image 36
Mark Miller Avatar answered Oct 17 '22 12:10

Mark Miller