I want to subset a dataframe by factor. I only want to retain factor levels above a certain frequency.
df <- data.frame(factor = c(rep("a",5),rep("b",5),rep("c",2)), variable = rnorm(12))
This code creates data frame:
factor variable
1 a -1.55902013
2 a 0.22355431
3 a -1.52195456
4 a -0.32842689
5 a 0.85650212
6 b 0.00962240
7 b -0.06621508
8 b -1.41347823
9 b 0.08969098
10 b 1.31565582
11 c -1.26141417
12 c -0.33364069
And I want to drop factor levels which repeated less than 5 times. I developed a for-loop and it is working:
for (i in 1:length(levels(df$factor))){
if(table(df$factor)[i] < 5){
df.new <- df[df$factor != names(table(df$factor))[i],]
}
}
But do quicker and prettier solutions exists?
Removing Levels from a Factor in R Programming – droplevels() Function. droplevels() function in R programming used to remove unused levels from a Factor. droplevels(x, exclude = if(anyNA(levels(x))) NULL else NA, …)
The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame. where x is an object from which to drop unused factor levels.
The droplevels R function removes unused levels of a factor. The function is typically applied to vectors or data frames.
How do I Rename Factor Levels in R? The simplest way to rename multiple factor levels is to use the levels() function. For example, to recode the factor levels “A”, “B”, and “C” you can use the following code: levels(your_df$Category1) <- c("Factor 1", "Factor 2", "Factor 3") .
require(dplyr)
df %>% group_by(factor) %>% filter(n() >= 5)
#factor variable
#1 a 2.0769363
#2 a 0.6187513
#3 a 0.2426108
#4 a -0.4279296
#5 a 0.2270024
#6 b -0.6839748
#7 b -0.3285610
#8 b 0.2625743
#9 b -0.9532957
#10 b 1.4526317
What about
df.new <- df[!(as.numeric(df$factor) %in% which(table(df$factor)<5)),]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With