Apply a function at group level in ggplot2

Question

Short Version

In ggplot2, what do I have to do in order to have a function inside the x or y of an aesthetic be applied after splitting the data (e.g. using group or color)?

Long Version

I'm trying to find a way to have ggplot apply a function within a group while plotting.

Motivating example

Suppose we have a population that all have a hidden value. The rank (and therefore the CDF) of these hidden values is exposed.

my_data <- data.table(class = sort(rep(x = c('a','b','c'), times = 3)))

hidden <- c(10, 15,  80,
             0, 50, 100,
             5, 90,  95)

my_data[, rank := ecdf(hidden)(hidden)]

I can use the overall CDF to infer the CDF inside the class. I want to then graph each class' CDF against the overall CDF, which helps me to see if the the distribution of the hidden value is consistent between classes.

My Best Attempt

After several iterations, I'm surprised this doesn't work. I would think that by setting the group at the highest level aesthetic, the function would be applied in the same way that stats are. Instead, ecdf(rank)(rank) is applied to the entire column again, which results in y being equal to x.

ggplot(data = my_data, mapping = aes(color = class)) +
  geom_line(mapping = aes(
    x = rank,
    y = ecdf(rank)(rank)
  ))

lines all on top of each other

Here is an example where a stat is applied at the level of color.

ggplot(data = data, mapping = aes(color = class)) +
  geom_density(mapping = aes(
    x = rank,
    y = ..scaled..
  ))

enter image description here

My Best Workaround

Through the magic of split-apply-combine (here accomplished using by from data.table), I can add an extra column to my data that accomplishes this.

data[, class_rank := ecdf(value)(value), class]
ggplot(data = data, mapping = aes(color = class)) +
  geom_line(mapping = aes(
    x = rank,
    y = class_rank
  ))

individual CDFs

Throwing extra columns onto my data isn't the worst thing, but ggplot2 does enough awesome stuff already that I feel like this is in there and I just can't find it.

Pedro J. Aphalo · Accepted Answer

aesthetics are mapped to the value returned by the expression on the rhs of = applied to data as passed through data. Grouping by mapping other aesthetics has no effect at this point.
grouping affects only operations within plot layers, so the only way of applying a function respecting grouping is within a ggplot statistic.
good examples of statistics that apply functions are stat_summary() and stat_smooth().

Consequently I think the only way of achieving what you ask within 'ggplot2' would be to use an aesthetic that doesn't yet exist. Defining a new statistic that summarizes the x aesthetic ignoring groups and the y aesthetic respecting the grouping should be doable, I think, but is it worth the effort? One can easily pre-process the data within the 'tidyverse' as shown below or with 'data.table' as in your own example...

library(ggplot2)
library(dplyr)

my_data <- data.frame(class = sort(rep(x = c('a','b','c'), times = 3)),
                      hidden = c(10, 15,  80, 0, 50, 100, 5, 90,  95))

my_data %>%
  mutate(rank = ecdf(hidden)(hidden)) %>%
  group_by(class) %>%
  mutate(class_rank = ecdf(hidden)(hidden)) %>%
  ggplot(aes(rank, class_rank, color = class)) +
    geom_line()

Apply a function at group level in ggplot2

Tags:

r

ggplot2

Short Version

Long Version

Motivating example

My Best Attempt

My Best Workaround

Adam Hoelscher

1 Answers

Pedro J. Aphalo

Recent Activity

Donate For Us

Apply a function at group level in ggplot2

Tags:

r

ggplot2

Short Version

Long Version

Motivating example

My Best Attempt

My Best Workaround

Adam Hoelscher

1 Answers

Pedro J. Aphalo

Related questions

Recent Activity

Donate For Us