Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the "jitter" function do in R?

Tags:

r

According to the documentation, the explanation for the jitter function is "Add a small amount of noise to a numeric vector."

What does this mean?

Is a random number associated with each number in the vector and added to it?

like image 405
blue-sky Avatar asked Jul 09 '13 12:07

blue-sky


2 Answers

Jittering indeed means just adding random noise to a vector of numeric values, by default this is done in jitter-function by drawing samples from the uniform distribution. The range of values in the jittering is chosen according to the data, if amount-parameter is not provided.

I think term 'jittering' covers other distributions than uniform, and it is typically used to better visualize overlapping values, such as integer covariates. This helps grasp where the density of observations is high. It is good practice to mention in the figure legend if some of the values have been jittered, even if it is obvious. Here is an example visualization with the jitter-function as well as a normal distribution jittering where I arbitrarily threw in value sd=0.1:

n <- 500
set.seed(1)
dat <- data.frame(integer = rep(1:3, each=n), continuous = c(rnorm(n, mean=1), rnorm(n, mean=2), rnorm(n, mean=3))^2)

par(mfrow=c(3,1))
plot(dat, main="No jitter for x-axis", xlab="Integer", ylab="Continuous")
plot(jitter(dat[,1]), dat[,2], main="Jittered x-axis (uniform distr.)", xlab="Integer", ylab="Continuous")
plot(dat[,1]+rnorm(3*n, sd=0.1), dat[,2], main="Jittered x-axis (normal distr.)", xlab="Integer", ylab="Continuous")

enter image description here

like image 178
Teemu Daniel Laajala Avatar answered Nov 16 '22 02:11

Teemu Daniel Laajala


A really good explanation of the Jitter effect and why it is necessary can be found in the Swirl course on Regression Models in R.

It takes the Sir Francis Galton's data on the relationship between heights of parents and their children and plots it out on the graph without jitter and then with jitter.

This is the one without jitter (plot(child ~ parent, galton)):

enter image description here

This is the one with jitter (please ignore the regression lines) (plot(jitter(child,4) ~ parent,galton)):

enter image description here

The course says that if you do not have jitter, many people will have the same height, so points falls on top of each other which is why some of the circles in the first plot look darker than others. However, by using R's function "jitter" on the children's heights, we can spread out the data to simulate the measurement errors and make high frequency heights more visible.

like image 36
Simon Avatar answered Nov 16 '22 01:11

Simon