Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quickest way to exclude variables with zero variance in R

Tags:

r

k-means

I am working with a very huge .csv dataset for an evaluation and yet I have got this error to resolve.

Warning in preProcess.default(data, method = c("center", "scale")) :
  These variables have zero variances: num_outbound_cmds, is_host_login
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)

What is the quickest way to exclude variables in my dataset whose variance is zero (0)?

like image 391
Desta Haileselassie Hagos Avatar asked Nov 02 '25 14:11

Desta Haileselassie Hagos


1 Answers

The R package caret has a function nearZeroVar that does a pretty good job of identifying columns in a matrix or data frame that have zero or near zero variance. It returns the indices as a vector, which you can use to remove those columns.

> df <- data.frame(a=1:5, b=sample(1:5), c=rep(1,5))
> df
  a b c
1 1 4 1
2 2 2 1
3 3 1 1
4 4 5 1
5 5 3 1
> nearZeroVar(df)
[1] 3
> df[,-nearZeroVar(df)]
  a b
1 1 4
2 2 2
3 3 1
4 4 5
5 5 3
like image 97
Dthal Avatar answered Nov 04 '25 05:11

Dthal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!