Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

best way to perform fct_lump on multiple columns [duplicate]

Tags:

r

I want to lump infrequent levels with a factor variable for multiple variables into 'other'. I tried to reproduce the problem below. Animal and color are 2 factor variables that I want to lump. It does not work when I put them in a list and loop through the list. But it works for one variable. My actual data set has tens of such variables and I want to find a clean way to do this with the dplyr approach.

library(tidyverse)
library(forcats)

data <- data.frame(ID=rep(1:12), animal=c('dog','cat','fish','dog','dog','dog','fish','fish','fish','snake','fish','dog'),color=c('red','green','blue','red','green',
                                          'red','green','red','green','red','green','red'))

### Does not work when I use a list and for loop

factor_columns <- c('animal','color')
for (feature in factor_columns) {
  data <- data %>%
    mutate(feature = fct_lump_prop(
      f = feature,
      prop = 0.2,
      other_level = 'other'
    ))} 

### Works with one column

data <- data %>%
  mutate(animal = fct_lump_prop(
    f = animal,
    prop = 0.2,
    other_level = 'other'
  )) 
like image 603
achilet Avatar asked Dec 21 '25 01:12

achilet


1 Answers

You can use across :

library(dplyr)
library(forcats)

data %>%
  mutate(across(factor_columns, fct_lump_prop,prop = 0.2,other_level = 'other'))
  #mutate_at in old dplyr
  #mutate_at(vars(factor_columns), fct_lump_prop,prop = 0.2,other_level = 'other')

You can also use lapply :

data[factor_columns] <- lapply(data[factor_columns], 
                         fct_lump_prop,prop = 0.2,other_level = 'other')
like image 133
Ronak Shah Avatar answered Dec 23 '25 17:12

Ronak Shah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!