Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reorder levels of a factor without changing order of values

Use the levels argument of factor:

df <- data.frame(f = 1:4, g = letters[1:4])
df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

levels(df$g)
# [1] "a" "b" "c" "d"

df$g <- factor(df$g, levels = letters[4:1])
# levels(df$g)
# [1] "d" "c" "b" "a"

df
#   f g
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d

some more, just for the record

## reorder is a base function
df$letters <- reorder(df$letters, new.order=letters[4:1])

library(gdata)
df$letters <- reorder.factor(df$letters, letters[4:1])

You may also find useful Relevel and combine_factor.


Since this question was last active Hadley has released his new forcats package for manipulating factors and I'm finding it outrageously useful. Examples from the OP's data frame:

levels(df$letters)
# [1] "a" "b" "c" "d"

To reverse levels:

library(forcats)
fct_rev(df$letters) %>% levels
# [1] "d" "c" "b" "a"

To add more levels:

fct_expand(df$letters, "e") %>% levels
# [1] "a" "b" "c" "d" "e"

And many more useful fct_xxx() functions.


so what you want, in R lexicon, is to change only the labels for a given factor variable (ie, leave the data as well as the factor levels, unchanged).

df$letters = factor(df$letters, labels=c("d", "c", "b", "a"))

given that you want to change only the datapoint-to-label mapping and not the data or the factor schema (how the datapoints are binned into individual bins or factor values, it might help to know how the mapping is originally set when you initially create the factor.

the rules are simple:

  • labels are mapped to levels by index value (ie, the value at levels[2] is given the label, label[2]);
  • factor levels can be set explicitly by passing them in via the the levels argument; or
  • if no value is supplied for the levels argument, the default value is used which is the result calling unique on the data vector passed in (for the data argument);
  • labels can be set explicitly via the labels argument; or
  • if no value is supplied for the labels argument, the default value is used which is just the levels vector