I have a factor with 2600 levels and I want to reduce it to ~10 before modelling
I thought I could do this with an operation that says "if a factor is listed fewer than x times, it should be placed into a bucket called "other"
Here is some example data:
df <- data.frame(colour=c("blue","blue","blue","green","green","orange","grey"))
And this is the output I am hoping for:
colour
1 blue
2 blue
3 blue
4 green
5 green
6 other
7 other
I have tried the below:
df %>% mutate(colour = ifelse(count(colour) < 2, 'other', colour))
Error in mutate_impl(.data, dots) : Evaluation error: no applicable method for 'groups' applied to an object of class "factor".
A factor must have at least two levels. If a factor only had one level then the effect of the factor could not be assessed.
The droplevels() function in R can be used to drop unused factor levels. This function is particularly useful if we want to drop factor levels that are no longer used due to subsetting a vector or a data frame.
Factor levels are all of the values that the factor can take (recall that a categorical variable has a set number of groups). In a designed experiment, the treatments represent each combination of factor levels. If there is only one factor with k levels, then there would be k treatments.
Factors are the variables that experimenters control during an experiment in order to determine their effect on the response variable. A factor can take on only a small number of values, which are known as factor levels.
There is actually a nice package in the tidyverse called forcats
which helps in dealing with factors. You can use fct_lump
, which does exactly what you need:
library(tidyverse)
df %>% mutate(colour = fct_lump(colour, n = 2))
#> colour
#> 1 blue
#> 2 blue
#> 3 blue
#> 4 green
#> 5 green
#> 6 Other
#> 7 Other
with tidyverse
functions, you can try something like:
df %>%
group_by(colour) %>%
mutate(cnt = n()) %>%
mutate(grp = if_else(cnt >= 2, as.character(colour), as.character("Other"))) %>%
select(-cnt)
which gives (here, the threshold value being >= 2
)
colour grp
<fct> <chr>
1 blue blue
2 blue blue
3 blue blue
4 green green
5 green green
6 orange Other
7 grey Other
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With