Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Functions inside aes

Tags:

r

ggplot2

Question: why can't I call sapply inside aes()?

Goal of following figure: Create histogram showing proportion that died/lived so that the proportion for each combination of group/type sums to 1 (example inspired by previous post).

I know you could make the figure by summarising outside of ggplot but the question is really about why the function isn't working inside of aes.

## Data
set.seed(999)
dat <- data.frame(group=factor(rep(1:2, 25)),
                  type=factor(sample(1:2, 50, rep=T)),
                  died=factor(sample(0:1, 50, rep=T)))

## Setup the figure
p <- ggplot(dat, aes(x=died, group=interaction(group, type), fill=group, alpha=type)) +
  theme_bw() +
  scale_alpha_discrete(range=c(0.5, 1)) +
  ylab("Proportion")

## Proportions, all groups/types together sum to 1 (not wanted)
p + geom_histogram(aes(y=..count../sum(..count..)), position=position_dodge())

enter image description here

## Look at groups
stuff <- ggplot_build(p)
stuff$data[[1]]

## The long way works: proportions by group/type
p + geom_histogram(
    aes(y=c(..count..[..group..==1] / sum(..count..[..group..==1]),
            ..count..[..group..==2] / sum(..count..[..group..==2]),
            ..count..[..group..==3] / sum(..count..[..group..==3]),
            ..count..[..group..==4] / sum(..count..[..group..==4]))),
        position='dodge'
)

enter image description here

## Why can't I call sapply there?
p + geom_histogram(
    aes(y=sapply(unique(..group..), function(g)
        ..count..[..group..==g] / sum(..count..[..group..==g]))),
        position='dodge'
)

Error in get(as.character(FUN), mode = "function", envir = envir) : object 'expr' of mode 'function' was not found

like image 701
Rorschach Avatar asked Jul 03 '15 17:07

Rorschach


People also ask

What is AES function?

The AES Encrypt function encrypts fields and content using the Advanced Encryption Standard (AES) algorithm with 128 bit strength. 128-bit encryption level requires an 8-byte key, which must be a 32-character hexadecimal string. Parameter: data, key. Data that is passed as the input is encrypted using the AES key.

What is AES function in R?

In R, the aes() function is often used within other graphing elements to specify the desired aesthetics. The aes() function can be used in a global manner (applying to all of the graph's elements) by nesting within ggplot() .

What is AES ggplot2?

Aesthetic Mapping ( aes ) In ggplot2 , aesthetic means “something you can see”. Each aesthetic is a mapping between a visual cue and a variable. Examples include: position (i.e., on the x and y axes) color (“outside” color)

What is Geom_point?

The point geom is used to create scatterplots. The scatterplot is most useful for displaying the relationship between two continuous variables.


2 Answers

So, the issue arises because of a recursive call to ggplot2:::strip_dots for any aesthetics that include 'calculated aesthetics'. There is some discussion around the calculated aesthetics in this SO question and answer. The relevant code in layer.r is here:

new <- strip_dots(aesthetics[is_calculated_aes(aesthetics)])

i.e. strip_dots is called only if there are calculated aesthetics, defined using the regex "\\.\\.([a-zA-z._]+)\\.\\.".

strip_dots in takes a recursive approach, working down through the nested calls and stripping out the dots. The code is like this:

function (expr) 
{
    if (is.atomic(expr)) {
        expr
    }
    else if (is.name(expr)) {
        as.name(gsub(match_calculated_aes, "\\1", as.character(expr)))
    }
    else if (is.call(expr)) {
        expr[-1] <- lapply(expr[-1], strip_dots)
        expr
    }
    else if (is.pairlist(expr)) {
        as.pairlist(lapply(expr, expr))
    }
    else if (is.list(expr)) {
        lapply(expr, strip_dots)
    }
    else {
        stop("Unknown input:", class(expr)[1])
    }
}

If we supply an anonymous function this code as follows:

anon <- as.call(quote(function(g) mean(g)))
ggplot2:::strip_dots(anon)

we reproduce the error:

#Error in get(as.character(FUN), mode = "function", envir = envir) : 
#  object 'expr' of mode 'function' was not found

Working through this, we can see that anon is a call. For calls, strip_dots will use lapply to call strip_dots on the second and third elements of the call. For an anonymous function like this, the second element is the formals of the function. If we look at the formals of anon using dput(formals(eval(anon))) or dput(anon[[2]]) we see this:

#pairlist(g = )

For pairlists, strip_dots tries to lapply it to itself. I'm not sure why this code is there, but certainly in this circumstance it leads to the error:

expr <- anon[[2]]
lapply(expr, expr)

# Error in get(as.character(FUN), mode = "function", envir = envir) : 
#  object 'expr' of mode 'function' was not found

TL; DR At this stage, ggplot2 doesn't support the use of anonymous functions within aes where a calculated aesthetic (such as ..count..) is used.

Anyway, the desired end result can be achieved using dplyr; in general I think it makes for more readable code to separate out the data summarisation from the plotting:

newDat <- dat %>%
  group_by(died, type, group) %>%
  summarise(count = n()) %>%
  group_by(type, group) %>%
  mutate(Proportion = count / sum(count))

p <- ggplot(newDat, aes(x = died, y = Proportion, group = interaction(group, type), fill=group, alpha=type)) +
  theme_bw() +
  scale_alpha_discrete(range=c(0.5, 1)) +
  geom_bar(stat = "identity", position = "dodge")

Final output

ggplot2 fix

I've forked ggplot2 and have made two changes to aes_calculated.r which fix the problem. The first was to correct the handling of pairlists to lapply strip_dots instead of expr, which I think must have been the intended behaviour. The second was that for formals with no default value (like in the examples provided here), as.character(as.name(expr)) throws an error because expr is an empty name, and while this is a valid construct, it's not possible to create one from an empty string.

Forked version of ggplot2 at https://github.com/NikNakk/ggplot2 and pull request just made.

Finally, after all that, the sapply example given doesn't work because it returns a 2 row by 4 column matrix rather than an 8 length vector. The corrected version is like this:

p + geom_histogram(
    aes(y=unlist(lapply(unique(..group..), function(g)
        ..count..[..group..==g] / sum(..count..[..group..==g])))),
    position='dodge'
)

This gives the same output as the dplyr solution above.

One other thing to note is that this lapply code assumes that the data at that stage is sorted by group. I think this is always the case, but if for whatever reason it weren't you would end up with the y data out of order. An alternative which preserves the order of the rows in the calculated data would be:

p + geom_histogram(
  aes(y={grp_total <- tapply(..count.., ..group.., sum);
  ..count.. / grp_total[as.character(..group..)]
  }),
  position='dodge'
)

It's also worth being aware that these expressions are evaluated in baseenv(), the namespace of the base package. This means that any functions from other packages, even standard ones like stats and utils, need to be used with the :: operator (e.g. stats::rnorm).

like image 50
Nick Kennedy Avatar answered Oct 01 '22 17:10

Nick Kennedy


After playing around a little, the problem appears to be using anonymous functions with ..group.. or ..count.. inside aes:

xy <- data.frame(x=1:10,y=1:10) #data

ggplot(xy, aes(x = x, y = sapply(y, mean))) + geom_line() #sapply is fine

ggplot(xy, aes(x = x, group = y)) + 
       geom_bar(aes(y = sapply(..group.., mean))) #sapply with ..group.. is fine

ggplot(xy, aes(x = x, group = y)) + 
       geom_bar(aes(y = sapply(..group.., function(g) {mean(g)})))
#broken, with same error

ggplot(xy, aes(x = x, group = y)) + 
    geom_bar(aes(y = sapply(y, function(g) {mean(g)})), stat = "identity")
#sapply with anonymous functions works fine!

It seems like a really weird bug, unless I'm missing something stupid.

like image 21
jeremycg Avatar answered Oct 01 '22 17:10

jeremycg