Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I maintain a color scheme across ggplots, while dropping unused levels in each plot?

I want to compare some sub-groups of my data in one plot and some other sub-groups in another plot. If I make one plot with all sub-groups plotted, the figure is overwhelming and each individual comparison becomes difficult. I think it will make more sense to the reader if a given subgroup is the same color across all plots.

Here's are two things I've tried that almost work, but neither quite works. They're as close as I can come to a MWE!

Wrong because all three levels are shown in the legend

library(tidyverse)

# compare first and second species
ggplot(data = iris %>% filter(Species != 'virginica'),
       mapping = aes(x = Sepal.Length,
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  scale_color_discrete(drop = FALSE)


# compare second and third species
ggplot(data = iris %>% filter(Species != 'setosa'),
       mapping = aes(x = Sepal.Length,
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  scale_color_discrete(drop = FALSE)

Note that the un-plotted level still appears in the legend (consistent with the idea of drop = FALSE).

Wrong because the second plot doesn't maintain the species-color mapping established by the first plot

# compare first and second species
ggplot(data = iris %>% filter(Species != 'virginica'),
       mapping = aes(x = Sepal.Length,
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  scale_color_manual(values = c('red', 'forestgreen', 'blue'),
                     breaks = unique(iris$Species))


# compare second and third species
ggplot(data = iris %>% filter(Species != 'setosa'),
       mapping = aes(x = Sepal.Length,
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  scale_color_manual(values = c('red', 'forestgreen', 'blue'),
                     breaks = unique(iris$Species))

Note that in the left plot setosa = red and virginica = green, but in the right plot that mapping is changed.

like image 748
rcorty Avatar asked Mar 19 '17 19:03

rcorty


1 Answers

The most effective way is to set a named variable of colors for each level (species) and use that in each plot.

Here, you can use the same colors you used above, but by adding names to the variable, you ensure that they always match up correctly:

irisColors <-
  setNames( c('red', 'forestgreen', 'blue')
            , levels(iris$Species)  )

Gives

setosa     versicolor     virginica 
 "red"  "forestgreen"        "blue"

And then you can use that to set your colors:

First with all colors:

ggplot(data = iris,
       mapping = aes(x = Sepal.Length,
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  scale_color_manual(values = irisColors)

enter image description here

Then each of the subsets from your question:

ggplot(data = iris %>% filter(Species != 'virginica'),
       mapping = aes(x = Sepal.Length,
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  scale_color_manual(values = irisColors)

enter image description here

ggplot(data = iris %>% filter(Species != 'setosa'),
       mapping = aes(x = Sepal.Length,
                     y = Sepal.Width,
                     color = Species)) +
  geom_point() +
  scale_color_manual(values = irisColors)

enter image description here

like image 91
Mark Peterson Avatar answered Oct 26 '22 14:10

Mark Peterson