I have a problem similar to this: I want to select just the columns with less than "n" levels, and I think I could do this using dplyr, but I don't know how.
Follows an example with Titanic data, where with str()
I have 3 factors with 2 levels and 1 factor with 4 levels. My ideia is to select just the columns with less than 4 levels.
str(as.data.frame(Titanic) %>% mutate_if(is.character, factor))
Any ideia?
Thanks in advance.
Just pass a function to select_if
, much like mutate_if
-- see ?nlevels
:
Titanic %>%
as_data_frame() %>%
mutate_if(is.character, factor) %>%
select_if(~ nlevels(.) < 4)
Note that you could also write this as: select_if(function(x) nlevels(x) < 4)
With the new dplyr verbs:
Titanic %>%
as_data_frame() %>%
mutate(across(where(is.character),.fns = as.factor)) %>%
select(where(~nlevels(.)<4))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With