Correct use of fun.data with stat_summary in ggplot2?

Question

From ?stat_summary.

fun.data : Complete summary function. Should take data frame as input and return data frame as output

I'm having trouble understanding this. It doesn't seem like my summary function so.summary is being passed a data frame at all!

Code:

set.seed(0)
so.example <- data.frame(
  sampleID=rep(1:15)
  , sales=runif(15, 0, 1)*1000
  , revenue=runif(15, 0, 1)*10000
)

so.summary <- function(z) {
  print(z)
  data.frame(sales=median(z$sales), revenue=median(z$revenue))
}

ggplot(
  so.example
  , aes(x=sales, y=revenue)
  ) + geom_point() + stat_summary(fun.data=so.summary, geom='point', color='red')

Output:

[1] 2672.207
Error in z$sales : $ operator is invalid for atomic vectors

logworthy · Accepted Answer

fun.data summarises y at each x. It takes a vector of the y values as input.

One use case is for mapping different summary statistics to different aesthetics:

set.seed(0)
week <- floor(runif(30, 1, 5))
sales <- week * runif(30, 0, 1)*10000
so.example <- data.frame(week=week, sales=sales)

so.summary <- function(y) {
  return(data.frame(y=median(y), size=length(y), alpha=sd(y)/10000))
}

ggplot(
  so.example
  , aes(x=week, y=sales)
) + geom_point() + stat_summary(fun.data=so.summary, geom='point', colour='red')

autorobin · Answer

Let's say you really want to use stat summary to plot that summary point. As logworthy noted, fun.data will receive a y vector for every unique x value. More broadly, it splits up the aesthetics data.frame for each unique x value (we'll use this fact later).But, if you move around your aesthetics, you can let stat_summary see the entire revenue vector.

ggplot(so.example) + 
  geom_point(aes(x=sales, y=revenue)) + 
  stat_summary(aes(x= median(sales), y= revenue), fun.y= median, geom= 'point', color= 'red')

Now, stat_summary can 'see' only 1 x aesthetic and the entire y vector in the fun.y input.

If that's not fun enough, you can trick fun.data into seeing other vectors of your frame, for example, let's say you needed a weighted mean instead of a median.

set.seed(0)
so.example <- data.frame(
  sampleID=rep(1:15)
  , sales=runif(15, 0, 1)*1000
  , revenue=runif(15, 0, 1)*10000
  , weight= runif(15, 0, 1)
)

so.mean.weight <- function(x, wt){ sum(x*wt)/sum(wt) }

I created a simple weight function, but, could easily grab weighted.mean from stats (more importantly, the function is clearly arbitrary as long as it returns 1 value for the inputs).

The trick for fun.data to see other vectors in the data.frame is to grab extra aesthetics from the parent.frame() while in the function. This is pretty hacky as you'll get warning messages. Let's take a look at the new stat_summary call:

stat_summary(
    aes(x= so.mean.weight(sales, weight), y= revenue, wt= weight) 
    , fun.data= so.summary
    , geom= 'point'
    , color= 'darkgreen'
    , size= 2
    )

Notice the input in the aes, wt= weight. In geom_point, there is no wt aesthetic (you want to choose something that you know is not an aesthetic name (foo/bar would have done just as well), so, you'll get a nice warning message telling you it will be ignored. It will be ignored, but not removed/deleted.

With that in mind, let's take a look inside the new so.summary function:

so.summary <- function(z) {
  # Grab aesthetic DF
  aesDF <- parent.frame()$df
  print(names(aesDF)) # returns input aesthetic vectors: "x", "y", "group", and "wt"
  pnt <- data.frame(y= so.mean.weight(z, aesDF$wt ) )
  print(pnt)
  pnt
}

You can see the call to the parent.frame()$df grants you access to the aesthetic vectors to use in your calculations.

Correct use of fun.data with stat_summary in ggplot2?

Tags:

r

ggplot2

logworthy

2 Answers

logworthy

autorobin

Recent Activity

Donate For Us

Correct use of fun.data with stat_summary in ggplot2?

Tags:

r

ggplot2

logworthy

2 Answers

logworthy

autorobin

Related questions

Recent Activity

Donate For Us