Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is jitter determined in ggplot?

I was looking over the documentation on jitter in ggplot while making some plots, and I realized that I don't really understand the argument.

It states that the arguments are:

Width: degree of jitter in x direction. Defaults to 40% of the resolution of the data. and
height: degree of jitter in y direction. Defaults to 40% of the resolution of the data.

My question is, what exactly is resolution, and how is it determined?

Also, you can override this and provide a value, like in the example below where we use 0.1:

geom_point(position = position_jitter(w = 0.1, h = 0.1))

What units belong to 0.1? Am I right to assume that this some proportion of the resolution?

like image 551
tumultous_rooster Avatar asked Apr 25 '15 02:04

tumultous_rooster


1 Answers

If we look at the source we first find this:

PositionJitter <- proto(Position, {
  objname <- "jitter"

  adjust <- function(., data) {
    if (empty(data)) return(data.frame())
    check_required_aesthetics(c("x", "y"), names(data), "position_jitter")

    if (is.null(.$width)) .$width <- resolution(data$x, zero = FALSE) * 0.4
    if (is.null(.$height)) .$height <- resolution(data$y, zero = FALSE) * 0.4

    trans_x <- NULL
    trans_y <- NULL
    if(.$width > 0) {
      trans_x <- function(x) jitter(x, amount = .$width)
    }
    if(.$height > 0) {
      trans_y <- function(x) jitter(x, amount = .$height)
    }

    transform_position(data, trans_x, trans_y)
  }

})

And wouldn't you know it, resolution is an exported function (or you could just search the sources for it landing you here):

function (x, zero = TRUE) 
{
    if (is.integer(x) || zero_range(range(x, na.rm = TRUE))) 
        return(1)
    x <- unique(as.numeric(x))
    if (zero) {
        x <- unique(c(0, x))
    }
    min(diff(sort(x)))
}

So...there you go!

"resolution" in this context then roughly means "the smallest distance between any two elements in a vector".

This value (40% of the resolution) is then passed on as the factor argument to jitter, which has it's own little song and dance:

The result, say r, is r <- x + runif(n, -a, a) where n <- length(x) and a is the amount argument (if specified).

Let z <- max(x) - min(x) (assuming the usual case). The amount a to be added is either provided as positive argument amount or otherwise computed from z, as follows:

If amount == 0, we set a <- factor * z/50 (same as S).

If amount is NULL (default), we set a <- factor * d/5 where d is the smallest difference between adjacent unique (apart from fuzz) x values.

like image 143
joran Avatar answered Sep 28 '22 17:09

joran