Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing columns with missing values

Tags:

r

I have a table with a lot of colums and I want to remove columns having more than 500 missing values.

I already know the number of missing values per column with :

library(fields)
t(stats(mm))

I got :

  N     mean  Std.Dev.    min       Q1  median       Q3 max missing values
V1 1600 8.67  …                                               400

Some columns exhibit NA for all the characteristics :

      N     mean  Std.Dev.    min       Q1  median       Q3 max missing values
 V50  NA    NA      NA         NA        NA                   NA

I also want to remove these kind of columns.

like image 758
Delphine Avatar asked Sep 07 '11 08:09

Delphine


4 Answers

Here is a one liner to do it mm[colSums(is.na(mm)) > 500]

like image 72
Ramnath Avatar answered Oct 13 '22 16:10

Ramnath


If you store the results of the stats call like this:

tmpres<-t(stats(mm))

You can do something like:

whichcolsneedtogo<-apply(tmpres, 1, function(currow){all(is.na(currow)) || (currow["missing values"] > 500)})

Finally:

mmclean<-mm[!whichcolsneedtogo]

Of course this is untested, as you have not provided data to reproduce your example.

like image 44
Nick Sabbe Avatar answered Oct 13 '22 16:10

Nick Sabbe


Another potential solution (works especially well with dataframes):

data[,!sapply(data,function(x) any(is.na(x)))]

like image 21
chandler Avatar answered Oct 13 '22 15:10

chandler


rem = NULL
for(col.nr in 1:dim(data)[2]){
    if(sum(is.na(data[, col.nr]) > 500 | all(is.na(data[,col.nr])))){
        rem = c(rem, col.nr)
    }
}
data[, -rem]
like image 37
pvoosten Avatar answered Oct 13 '22 17:10

pvoosten