I want to remove all columns with a standard deviation of zero from a data.frame.
This does not work:
df <- df[, ! apply(df , 2 , function(x) sd(x)==0 ) ]
I get error:
undefined columns selected
UPDATE
I selected Filter
as my preferred answer as it also seems to handle NA
s, which is very useful.
For example, in
df <- data.frame(v1=c(0,0,NA,0,0), v2=1:5)
the column 'v1' is removed with Filter
while the apply
methods produce errors.
Thanks to all the other solutions, I learned a lot from them.
UPDATE2:
Those errors given by apply can be fixed by adding na.rm = TRUE
to the call to sd like so:
df[, ! apply(df , 2 , function(x) sd(x, na.rm = TRUE)==0 ) ]
Method 1: Use the index = False argument In this method, you have to not directly output the dataframe to the CSV file. But you should also include index = False argument. It will automatically drop the unnamed column in pandas.
Alternatively, you can also use axis=1 as a param to remove columns with NaN, for example df. dropna(axis=1) . Use dropna(axis=0) to drop rows with NaN values from pandas DataFrame.
For example, if we have a data frame called df then we can remove rows that contain at least one 0 can be done by using the command df[apply(df,1, function(x) all(x!= 0)),].
Whenever you have a column in a data frame with only one distinct value, that column will have zero variance.
use filter:
Filter(function(x) sd(x) != 0, df)
In addition to @grrgrrbla 's and @akrun 's answer using Filter
, here is the correct way to do what you originally had in mind:
df <- df[, !sapply(df, function(x) { sd(x) == 0} )]
Or
df <- df[, sapply(df, function(x) { sd(x) != 0} )]
I used sapply()
to get a vector which is TRUE
when a data frame column have a standard deviation of 0 and FALSE
otherwise. Then I subset the original data frame using this vector.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With