In ggplot2, what do I have to do in order to have a function inside the x or y of an aesthetic be applied after splitting the data (e.g. using group or color)?
I'm trying to find a way to have ggplot apply a function within a group while plotting.
Suppose we have a population that all have a hidden value. The rank (and therefore the CDF) of these hidden values is exposed.
my_data <- data.table(class = sort(rep(x = c('a','b','c'), times = 3)))
hidden <- c(10, 15, 80,
0, 50, 100,
5, 90, 95)
my_data[, rank := ecdf(hidden)(hidden)]
I can use the overall CDF to infer the CDF inside the class. I want to then graph each class' CDF against the overall CDF, which helps me to see if the the distribution of the hidden value is consistent between classes.
After several iterations, I'm surprised this doesn't work. I would think that by setting the group at the highest level aesthetic, the function would be applied in the same way that stats are. Instead, ecdf(rank)(rank) is applied to the entire column again, which results in y being equal to x.
ggplot(data = my_data, mapping = aes(color = class)) +
geom_line(mapping = aes(
x = rank,
y = ecdf(rank)(rank)
))

Here is an example where a stat is applied at the level of color.
ggplot(data = data, mapping = aes(color = class)) +
geom_density(mapping = aes(
x = rank,
y = ..scaled..
))

Through the magic of split-apply-combine (here accomplished using by from data.table), I can add an extra column to my data that accomplishes this.
data[, class_rank := ecdf(value)(value), class]
ggplot(data = data, mapping = aes(color = class)) +
geom_line(mapping = aes(
x = rank,
y = class_rank
))

Throwing extra columns onto my data isn't the worst thing, but ggplot2 does enough awesome stuff already that I feel like this is in there and I just can't find it.
data. Grouping by
mapping other aesthetics has no effect at this point.stat_summary() and
stat_smooth().Consequently I think the only way of achieving what you ask within 'ggplot2' would be to use an aesthetic that doesn't yet exist. Defining a new statistic that summarizes the x aesthetic ignoring groups and the y aesthetic respecting the grouping should be doable, I think, but is it worth the effort? One can easily pre-process the data within the 'tidyverse' as shown below or with 'data.table' as in your own example...
library(ggplot2)
library(dplyr)
my_data <- data.frame(class = sort(rep(x = c('a','b','c'), times = 3)),
hidden = c(10, 15, 80, 0, 50, 100, 5, 90, 95))
my_data %>%
mutate(rank = ecdf(hidden)(hidden)) %>%
group_by(class) %>%
mutate(class_rank = ecdf(hidden)(hidden)) %>%
ggplot(aes(rank, class_rank, color = class)) +
geom_line()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With