I have a table with a lot of colums and I want to remove columns having more than 500 missing values.
I already know the number of missing values per column with :
library(fields)
t(stats(mm))
I got :
N mean Std.Dev. min Q1 median Q3 max missing values
V1 1600 8.67 … 400
Some columns exhibit NA for all the characteristics :
N mean Std.Dev. min Q1 median Q3 max missing values
V50 NA NA NA NA NA NA
I also want to remove these kind of columns.
Here is a one liner to do it mm[colSums(is.na(mm)) > 500]
If you store the results of the stats call like this:
tmpres<-t(stats(mm))
You can do something like:
whichcolsneedtogo<-apply(tmpres, 1, function(currow){all(is.na(currow)) || (currow["missing values"] > 500)})
Finally:
mmclean<-mm[!whichcolsneedtogo]
Of course this is untested, as you have not provided data to reproduce your example.
Another potential solution (works especially well with dataframes):
data[,!sapply(data,function(x) any(is.na(x)))]
rem = NULL
for(col.nr in 1:dim(data)[2]){
if(sum(is.na(data[, col.nr]) > 500 | all(is.na(data[,col.nr])))){
rem = c(rem, col.nr)
}
}
data[, -rem]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With