Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to drop columns from data frame with less than 2 unique levels in R

Tags:

r

I have a dataset with numeric and categorical variables with ~200,000 rows, but many variables are constants(both numeric and cat). I am trying to create a new dataset where the length(unique(data.frame$factor))<=1 variables are dropped.

Example data set and attempts so far:

Temp=c(26:30)
Feels=c("cold","cold","cold","hot","hot")
Time=c("night","night","night","night","night")
Year=c(2015,2015,2015,2015,2015)
DF=data.frame(Temp,Feels,Time,Year)

I would think a loop would work, but something isn't working in my 2 below attempts. I've tried:

for (i in unique(colnames(DF))){
  Reduced_DF <- DF[,(length(unique(DF$i)))>1]
}

But I really need a vector of the colnames where length(unique(DF$columns))>1, so I tried the below instead, to no avail.

for (i in unique(DF)){
  if (length(unique(DF$i)) >1)
  {keepvars <- c(DF$i)}
  Reduced_DF <- DF[keepvars]
}

Does anyone out there have experience with this type of subsetting/dropping of columns with less than a certain level count?

like image 486
surfhoya Avatar asked Apr 20 '15 22:04

surfhoya


People also ask

How do I drop unwanted columns in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.

How do I drop unused levels in R?

The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame. where x is an object from which to drop unused factor levels.

How do I remove multiple columns from a Dataframe in R?

We can delete multiple columns in the R dataframe by assigning null values through the list() function.


1 Answers

You can find out how many unique values are in each column with:

sapply(DF, function(col) length(unique(col)))
#  Temp Feels  Time  Year 
#  5     2     1     1 

You can use this to subset the columns:

DF[, sapply(DF, function(col) length(unique(col))) > 1]
#   Temp Feels
# 1   26  cold
# 2   27  cold
# 3   28  cold
# 4   29   hot
# 5   30   hot
like image 197
David Robinson Avatar answered Sep 20 '22 00:09

David Robinson