Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

idiomatic way to combine levels of a categorical factor [duplicate]

Tags:

r

dplyr

Here's a trivial example of what I'm trying to do:

iris %>%
  mutate(Species2 = ifelse(Species %in% c("setosa", "virginica"), "other", as.character(Species)) %>% as.factor) %>%
  str
# 'data.frame': 150 obs. of  6 variables:
#   $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ Species2    : Factor w/ 2 levels "Other","versicolor": 1 1 1 1 1 1 1 1 1 1 ...

However, if I want to do multiple merges, I'd end up with deeply nested ifelse statements, which I'm trying to avoid. What's the most elegant way to do this? Preferably I can incorporate the solution into a dplyr pipeline.

like image 762
kevinykuo Avatar asked Nov 09 '22 17:11

kevinykuo


1 Answers

You can use match:

species.keep <- c("setosa", "virginica", "other")
iris %>% mutate(Species2 = species.keep[match(Species, species.keep, nomatch=3)])

We use the nomatch argument to match to map to "other" at the last position of our species.keep vector for any species that are not in previous positions. Note this assumes "other" is not a valid species. You'll have to add the as.factor etc., but this should get to what you want. match is the baseline mapping function in R.

like image 179
BrodieG Avatar answered Nov 15 '22 07:11

BrodieG