I have a dataframe loaded that has trailing white spaces in the factor labels. I am trying to remove those trailing spaces in every factor in the dataframe but am unsuccessful so far.
Reproducable example
lvls <- c('a ',
'b ',
'c ')
set.seed(314)
raw <- data.frame(a = factor(sample(lvls,100, replace=T)),
b = sample(1:100,100))
proc <- raw %>% mutate_each(funs(ifelse(is.factor(.),
factor(as.character(trimws(.)),
labels=unique(as.character(.))),
.)))
str(proc)
gives
'data.frame': 100 obs. of 2 variables:
$ a: int 1 1 1 1 1 1 1 1 1 1 ...
$ b: int 31 31 31 31 31 31 31 31 31 31 ...
Which is wrong on two levels. The factor has no labels. Only the first observation is repeated 100 times
mutate_if
is your friend. If you don't care if you convert to character, you can just use
raw %>% mutate_if(is.factor, trimws)
which suggests that you can just reconvert to factor:
raw %>% mutate_if(is.factor, funs(factor(trimws(.))))
If you want to maintain the type, you can use the more convoluted
raw %>% mutate_if(is.factor, funs(`levels<-`(., trimws(levels(.)))))
The base R equivalent would be
raw[] <- lapply(raw, function(x){if (is.factor(x)) {levels(x) <- trimws(levels(x))} ; x})
though if it's a single variable and you know which, base is pretty clean:
levels(raw$a) <- trimws(levels(raw$a))
Edit: Now forcats::relabel
(part of the tidyverse) makes changing levels with a function easier:
raw %>% mutate_if(is.factor, fct_relabel, trimws)
or for a single variable,
raw %>% mutate(a = fct_relabel(a, trimws))
It will accept anonymous functions as well, including purrr-style ~trimws(.x)
if you like.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With