Removing columns with missing values

Question

I have a table with a lot of colums and I want to remove columns having more than 500 missing values.

I already know the number of missing values per column with :

library(fields)
t(stats(mm))

I got :

  N     mean  Std.Dev.    min       Q1  median       Q3 max missing values
V1 1600 8.67  …                                               400

Some columns exhibit NA for all the characteristics :

      N     mean  Std.Dev.    min       Q1  median       Q3 max missing values
 V50  NA    NA      NA         NA        NA                   NA

I also want to remove these kind of columns.

Ramnath · Accepted Answer

Here is a one liner to do it mm[colSums(is.na(mm)) > 500]

Nick Sabbe · Answer

If you store the results of the stats call like this:

tmpres<-t(stats(mm))

You can do something like:

whichcolsneedtogo<-apply(tmpres, 1, function(currow){all(is.na(currow)) || (currow["missing values"] > 500)})

Finally:

mmclean<-mm[!whichcolsneedtogo]

Of course this is untested, as you have not provided data to reproduce your example.

chandler · Answer

Another potential solution (works especially well with dataframes):

data[,!sapply(data,function(x) any(is.na(x)))]

pvoosten · Answer

rem = NULL
for(col.nr in 1:dim(data)[2]){
    if(sum(is.na(data[, col.nr]) > 500 | all(is.na(data[,col.nr])))){
        rem = c(rem, col.nr)
    }
}
data[, -rem]

Removing columns with missing values

Tags:

r

Delphine

4 Answers

Ramnath

Nick Sabbe

chandler

pvoosten

Recent Activity

Donate For Us

Removing columns with missing values

Tags:

r

Delphine

4 Answers

Ramnath

Nick Sabbe

chandler

pvoosten

Related questions

Recent Activity

Donate For Us