I want to remove all columns with a standard deviation of zero from a data.frame. This does not work: <pre class="prettyprint"><code> df <- df[, ! apply(df , 2 , function(x) sd(x)==0 ) ] </code></pre> I get error: <blockquote> undefined columns selected </blockquote> UPDATE I selected <code>Filter</code> as my preferred answer as it also seems to handle <code>NA</code>s, which is very useful. For example, in <pre class="prettyprint"><code>df <- data.frame(v1=c(0,0,NA,0,0), v2=1:5) </code></pre> the column 'v1' is removed with <code>Filter</code> while the <code>apply</code> methods produce errors. Thanks to all the other solutions, I learned a lot from them. UPDATE2: Those errors given by apply can be fixed by adding <code>na.rm = TRUE</code> to the call to sd like so: <pre class="prettyprint"><code>df[, ! apply(df , 2 , function(x) sd(x, na.rm = TRUE)==0 ) ] </code></pre>

use filter: <pre class="prettyprint"><code>Filter(function(x) sd(x) != 0, df) </code></pre>

In addition to @grrgrrbla 's and @akrun 's answer using <code>Filter</code>, here is the correct way to do what you originally had in mind: <pre class="prettyprint"><code>df <- df[, !sapply(df, function(x) { sd(x) == 0} )] </code></pre> Or <pre class="prettyprint"><code>df <- df[, sapply(df, function(x) { sd(x) != 0} )] </code></pre> I used <code>sapply()</code> to get a vector which is <code>TRUE</code> when a data frame column have a standard deviation of 0 and <code>FALSE</code> otherwise. Then I subset the original data frame using this vector.

Remove columns with standard deviation of zero

Tags:

dataframe

r

I want to remove all columns with a standard deviation of zero from a data.frame.

This does not work:

  df <- df[, ! apply(df , 2 , function(x) sd(x)==0 ) ]

I get error:

undefined columns selected

UPDATE

I selected Filter as my preferred answer as it also seems to handle NAs, which is very useful.

For example, in

df <- data.frame(v1=c(0,0,NA,0,0), v2=1:5)

the column 'v1' is removed with Filter while the apply methods produce errors.

Thanks to all the other solutions, I learned a lot from them.

UPDATE2:

Those errors given by apply can be fixed by adding na.rm = TRUE to the call to sd like so:

df[, ! apply(df , 2 , function(x) sd(x, na.rm = TRUE)==0 ) ]

714

asked Jun 12 '15 08:06

spore234

2 Answers

use filter:

Filter(function(x) sd(x) != 0, df)

answered Sep 24 '22 04:09

grrgrrbla

In addition to @grrgrrbla 's and @akrun 's answer using Filter, here is the correct way to do what you originally had in mind:

df <- df[, !sapply(df, function(x) { sd(x) == 0} )]

df <- df[, sapply(df, function(x) { sd(x) != 0} )]

I used sapply() to get a vector which is TRUE when a data frame column have a standard deviation of 0 and FALSE otherwise. Then I subset the original data frame using this vector.

answered Sep 22 '22 04:09

Tim Biegeleisen

Related questions
                            
                                How much faster is C than R in practice?
                            
                                Converting a list to one row data.frame
                            
                                Why does var act like cov in R?
                            
                                Why is there no apply.hourly in R with xts/zoo?
                            
                                Convert daily to weekly/monthly data with R
                            
                                How to convert a list into a matrix in R
                            
                                SpatialPolygonDataFrame plotting using ggplot
                            
                                How to remove special characters, spaces and trim in one string a character variable in R
                            
                                check if vector contains another vector
                            
                                How to rename the rows of a dataframe on the fly?
                            
                                Efficient way to create term density matrix from pandas DataFrame
                            
                                add missing rows to a data table
                            
                                Get the list of items in Venn diagram
                            
                                Remove rows from dataframe that contains only 0 or just a single 0
                            
                                Find the desktop path in R
                            
                                split dataframe by row number in R
                            
                                How to simplify a nested list in R?
                            
                                Trying to generate averages across multiple lists
                            
                                Taking column mean over a list of data frames in R
                            
                                How to change color scheme in corrplot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With