Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply a function at group level in ggplot2

Tags:

r

ggplot2

Short Version

In ggplot2, what do I have to do in order to have a function inside the x or y of an aesthetic be applied after splitting the data (e.g. using group or color)?

Long Version

I'm trying to find a way to have ggplot apply a function within a group while plotting.

Motivating example

Suppose we have a population that all have a hidden value. The rank (and therefore the CDF) of these hidden values is exposed.

my_data <- data.table(class = sort(rep(x = c('a','b','c'), times = 3)))

hidden <- c(10, 15,  80,
             0, 50, 100,
             5, 90,  95)

my_data[, rank := ecdf(hidden)(hidden)]

I can use the overall CDF to infer the CDF inside the class. I want to then graph each class' CDF against the overall CDF, which helps me to see if the the distribution of the hidden value is consistent between classes.

My Best Attempt

After several iterations, I'm surprised this doesn't work. I would think that by setting the group at the highest level aesthetic, the function would be applied in the same way that stats are. Instead, ecdf(rank)(rank) is applied to the entire column again, which results in y being equal to x.

ggplot(data = my_data, mapping = aes(color = class)) +
  geom_line(mapping = aes(
    x = rank,
    y = ecdf(rank)(rank)
  ))

lines all on top of each other

Here is an example where a stat is applied at the level of color.

ggplot(data = data, mapping = aes(color = class)) +
  geom_density(mapping = aes(
    x = rank,
    y = ..scaled..
  ))

enter image description here

My Best Workaround

Through the magic of split-apply-combine (here accomplished using by from data.table), I can add an extra column to my data that accomplishes this.

data[, class_rank := ecdf(value)(value), class]
ggplot(data = data, mapping = aes(color = class)) +
  geom_line(mapping = aes(
    x = rank,
    y = class_rank
  ))

individual CDFs

Throwing extra columns onto my data isn't the worst thing, but ggplot2 does enough awesome stuff already that I feel like this is in there and I just can't find it.

like image 605
Adam Hoelscher Avatar asked Apr 19 '26 20:04

Adam Hoelscher


1 Answers

  1. aesthetics are mapped to the value returned by the expression on the rhs of = applied to data as passed through data. Grouping by mapping other aesthetics has no effect at this point.
  2. grouping affects only operations within plot layers, so the only way of applying a function respecting grouping is within a ggplot statistic.
  3. good examples of statistics that apply functions are stat_summary() and stat_smooth().

Consequently I think the only way of achieving what you ask within 'ggplot2' would be to use an aesthetic that doesn't yet exist. Defining a new statistic that summarizes the x aesthetic ignoring groups and the y aesthetic respecting the grouping should be doable, I think, but is it worth the effort? One can easily pre-process the data within the 'tidyverse' as shown below or with 'data.table' as in your own example...

library(ggplot2)
library(dplyr)

my_data <- data.frame(class = sort(rep(x = c('a','b','c'), times = 3)),
                      hidden = c(10, 15,  80, 0, 50, 100, 5, 90,  95))

my_data %>%
  mutate(rank = ecdf(hidden)(hidden)) %>%
  group_by(class) %>%
  mutate(class_rank = ecdf(hidden)(hidden)) %>%
  ggplot(aes(rank, class_rank, color = class)) +
    geom_line()
like image 75
Pedro J. Aphalo Avatar answered Apr 21 '26 17:04

Pedro J. Aphalo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!