I have a dataset with numeric and categorical variables with ~200,000 rows, but many variables are constants(both numeric and cat). I am trying to create a new dataset where the length(unique(data.frame$factor))<=1
variables are dropped.
Example data set and attempts so far:
Temp=c(26:30)
Feels=c("cold","cold","cold","hot","hot")
Time=c("night","night","night","night","night")
Year=c(2015,2015,2015,2015,2015)
DF=data.frame(Temp,Feels,Time,Year)
I would think a loop would work, but something isn't working in my 2 below attempts. I've tried:
for (i in unique(colnames(DF))){
Reduced_DF <- DF[,(length(unique(DF$i)))>1]
}
But I really need a vector of the colnames where length(unique(DF$columns))>1, so I tried the below instead, to no avail.
for (i in unique(DF)){
if (length(unique(DF$i)) >1)
{keepvars <- c(DF$i)}
Reduced_DF <- DF[keepvars]
}
Does anyone out there have experience with this type of subsetting/dropping of columns with less than a certain level count?
The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.
The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame. where x is an object from which to drop unused factor levels.
We can delete multiple columns in the R dataframe by assigning null values through the list() function.
You can find out how many unique values are in each column with:
sapply(DF, function(col) length(unique(col)))
# Temp Feels Time Year
# 5 2 1 1
You can use this to subset the columns:
DF[, sapply(DF, function(col) length(unique(col))) > 1]
# Temp Feels
# 1 26 cold
# 2 27 cold
# 3 28 cold
# 4 29 hot
# 5 30 hot
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With