Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select only columns (type=factor) with less than n levels with dplyr?

Tags:

r

dplyr

I have a problem similar to this: I want to select just the columns with less than "n" levels, and I think I could do this using dplyr, but I don't know how.

Follows an example with Titanic data, where with str() I have 3 factors with 2 levels and 1 factor with 4 levels. My ideia is to select just the columns with less than 4 levels.

str(as.data.frame(Titanic) %>% mutate_if(is.character, factor))

Any ideia?

Thanks in advance.

like image 887
Wlademir Ribeiro Prates Avatar asked Oct 19 '25 14:10

Wlademir Ribeiro Prates


2 Answers

Just pass a function to select_if, much like mutate_if -- see ?nlevels:

Titanic %>%
  as_data_frame() %>%
  mutate_if(is.character, factor) %>%
  select_if(~ nlevels(.) < 4)

Note that you could also write this as: select_if(function(x) nlevels(x) < 4)

like image 140
JasonAizkalns Avatar answered Oct 22 '25 04:10

JasonAizkalns


With the new dplyr verbs:

Titanic %>% 
  as_data_frame() %>%
  mutate(across(where(is.character),.fns = as.factor)) %>% 
  select(where(~nlevels(.)<4))
like image 40
Tobias Avatar answered Oct 22 '25 04:10

Tobias



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!