I'll use violin plots here as an example, but the question extends to many other ggplot types.
I know how to subset my data along the x-axis by a factor:
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  geom_violin() +
  geom_point(position = "jitter")

And I know how to plot only the full dataset:
ggplot(iris, aes(x = 1, y = Sepal.Length)) +
  geom_violin() +
  geom_point(position = "jitter")

My question is: is there a way to plot the full data AND a subset-by-factor side-by-side in the same plot? In other words, for the iris data, could I make a violin plot that has both "full data" and "setosa" along the x-axis?
This would enable a comparison of the distribution of a full dataset and a subset of that dataset. If this isn't possible, any recommendations on better way to visualise this would also be welcome :)
Thanks for any ideas!
Using:
ggplot(iris, aes(x = "All", y = Sepal.Length)) +
  geom_violin() +
  geom_point(aes(color="All"), position = "jitter") +
  geom_violin(data=iris, aes(x = Species, y = Sepal.Length)) +
  geom_point(data=iris, aes(x = Species, y = Sepal.Length, color = Species), 
             position = "jitter") +
  scale_color_manual(values = c("black","#F8766D","#00BA38","#619CFF")) +
  theme_minimal(base_size = 16) +
  theme(axis.title.x = element_blank(), legend.title = element_blank())
gives:

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With