How to drop columns from data frame with less than 2 unique levels in R

Tags:

r

I have a dataset with numeric and categorical variables with ~200,000 rows, but many variables are constants(both numeric and cat). I am trying to create a new dataset where the length(unique(data.frame$factor))<=1 variables are dropped.

Example data set and attempts so far:

Temp=c(26:30)
Feels=c("cold","cold","cold","hot","hot")
Time=c("night","night","night","night","night")
Year=c(2015,2015,2015,2015,2015)
DF=data.frame(Temp,Feels,Time,Year)

I would think a loop would work, but something isn't working in my 2 below attempts. I've tried:

for (i in unique(colnames(DF))){
  Reduced_DF <- DF[,(length(unique(DF$i)))>1]
}

But I really need a vector of the colnames where length(unique(DF$columns))>1, so I tried the below instead, to no avail.

for (i in unique(DF)){
  if (length(unique(DF$i)) >1)
  {keepvars <- c(DF$i)}
  Reduced_DF <- DF[keepvars]
}

Does anyone out there have experience with this type of subsetting/dropping of columns with less than a certain level count?

486

asked Apr 20 '15 22:04

surfhoya

1 Answers

You can find out how many unique values are in each column with:

sapply(DF, function(col) length(unique(col)))
#  Temp Feels  Time  Year 
#  5     2     1     1

You can use this to subset the columns:

DF[, sapply(DF, function(col) length(unique(col))) > 1]
#   Temp Feels
# 1   26  cold
# 2   27  cold
# 3   28  cold
# 4   29   hot
# 5   30   hot

197

answered Sep 20 '22 00:09

David Robinson

Related questions
                            
                                R Markdown PowerPoint Slide Customization
                            
                                Produce an inset in each facet of an R ggplot while preserving colours of the original facet content
                            
                                Aggregating sequential and grouped data in R
                            
                                Calculate 95th percentile of values with grouping variable
                            
                                Object not found error with ggplot2
                            
                                Conditional gsub replacement
                            
                                Outer loop variable in nested R foreach loop
                            
                                Reshaping several variables wide with cast
                            
                                R gbm handling of missing values
                            
                                Converting a data.frame to a list of lists
                            
                                append rows to dataframe using foreach package
                            
                                else if(){} VS ifelse()
                            
                                detecting word boundary with regex in data frame in R
                            
                                How to delete rows from a dataframe that contain n*NA
                            
                                Vector to Matrix of Differences between elements
                            
                                Is it possible to define the "mid" range in scale_fill_gradient2()?
                            
                                How do I count the number of words in a text (string)?
                            
                                RODBC sqlSave table creation problems
                            
                                how to convert country codes into country names in a column within a data frame using R?
                            
                                Use dplyr to filter out columns containing characters

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With