Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strip trailing spaces from factor labels using dplyr chain

Tags:

r

dplyr

I have a dataframe loaded that has trailing white spaces in the factor labels. I am trying to remove those trailing spaces in every factor in the dataframe but am unsuccessful so far.

Reproducable example

lvls <- c('a   ',
          'b   ',
          'c   ')
set.seed(314)
raw <- data.frame(a = factor(sample(lvls,100, replace=T)),
                  b = sample(1:100,100))

proc <- raw %>% mutate_each(funs(ifelse(is.factor(.),
                                        factor(as.character(trimws(.)),
                                               labels=unique(as.character(.))),
                                        .))) 

str(proc)

gives

'data.frame':   100 obs. of  2 variables:
 $ a: int  1 1 1 1 1 1 1 1 1 1 ...
 $ b: int  31 31 31 31 31 31 31 31 31 31 ...

Which is wrong on two levels. The factor has no labels. Only the first observation is repeated 100 times

like image 571
Wietze314 Avatar asked Jan 04 '17 15:01

Wietze314


1 Answers

mutate_if is your friend. If you don't care if you convert to character, you can just use

raw %>% mutate_if(is.factor, trimws)

which suggests that you can just reconvert to factor:

raw %>% mutate_if(is.factor, funs(factor(trimws(.))))

If you want to maintain the type, you can use the more convoluted

raw %>% mutate_if(is.factor, funs(`levels<-`(., trimws(levels(.)))))

The base R equivalent would be

raw[] <- lapply(raw, function(x){if (is.factor(x)) {levels(x) <- trimws(levels(x))} ; x})

though if it's a single variable and you know which, base is pretty clean:

levels(raw$a) <- trimws(levels(raw$a))

Edit: Now forcats::relabel (part of the tidyverse) makes changing levels with a function easier:

raw %>% mutate_if(is.factor, fct_relabel, trimws)

or for a single variable,

raw %>% mutate(a = fct_relabel(a, trimws))

It will accept anonymous functions as well, including purrr-style ~trimws(.x) if you like.

like image 177
alistaire Avatar answered Nov 15 '22 08:11

alistaire