Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove columns with standard deviation of zero

Tags:

dataframe

r

I want to remove all columns with a standard deviation of zero from a data.frame.

This does not work:

  df <- df[, ! apply(df , 2 , function(x) sd(x)==0 ) ]

I get error:

undefined columns selected

UPDATE

I selected Filter as my preferred answer as it also seems to handle NAs, which is very useful.

For example, in

df <- data.frame(v1=c(0,0,NA,0,0), v2=1:5)

the column 'v1' is removed with Filter while the apply methods produce errors.

Thanks to all the other solutions, I learned a lot from them.

UPDATE2:

Those errors given by apply can be fixed by adding na.rm = TRUE to the call to sd like so:

df[, ! apply(df , 2 , function(x) sd(x, na.rm = TRUE)==0 ) ]
like image 714
spore234 Avatar asked Jun 12 '15 08:06

spore234


People also ask

How do I get rid of zero columns in pandas?

Method 1: Use the index = False argument In this method, you have to not directly output the dataframe to the CSV file. But you should also include index = False argument. It will automatically drop the unnamed column in pandas.

How do I remove a column from a zero value in Python?

Alternatively, you can also use axis=1 as a param to remove columns with NaN, for example df. dropna(axis=1) . Use dropna(axis=0) to drop rows with NaN values from pandas DataFrame.

How do I remove zeros from a column in R?

For example, if we have a data frame called df then we can remove rows that contain at least one 0 can be done by using the command df[apply(df,1, function(x) all(x!= 0)),].

What is zero variance column?

Whenever you have a column in a data frame with only one distinct value, that column will have zero variance.


2 Answers

use filter:

Filter(function(x) sd(x) != 0, df)
like image 55
grrgrrbla Avatar answered Sep 24 '22 04:09

grrgrrbla


In addition to @grrgrrbla 's and @akrun 's answer using Filter, here is the correct way to do what you originally had in mind:

df <- df[, !sapply(df, function(x) { sd(x) == 0} )]

Or

df <- df[, sapply(df, function(x) { sd(x) != 0} )]

I used sapply() to get a vector which is TRUE when a data frame column have a standard deviation of 0 and FALSE otherwise. Then I subset the original data frame using this vector.

like image 26
Tim Biegeleisen Avatar answered Sep 22 '22 04:09

Tim Biegeleisen