Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make scatterplot with geom_jitter plot reproducible?

I am using the Australian AIDS Survival Data. This time to create scatterplots.

To show the genders in survival of different Reported transmission category (T.categ), I plot the chart in this way:

data <- read.csv("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/MASS/Aids2.csv")

data %>%
  ggplot() +
  geom_jitter(aes(T.categ, sex, colour = status))

It shows a chart. But each time I run the code, it seems to produce a different chart. Here are 2 of them putting together.

enter image description here

Anything wrong with the codes? Is it normal (each run a different chart)?

like image 584
Mark K Avatar asked Feb 16 '18 08:02

Mark K


2 Answers

if you use geom_point instead of geom_jitter, you can add position = position_jitter(), which accepts the seed argument:

library(ggplot2)
p <- ggplot(mtcars, aes(as.factor(cyl), disp)) 

p + geom_point(position = position_jitter(seed = 42))


p + geom_point(position = position_jitter(seed = 1))

And back to "42"


p + geom_point(position = position_jitter(seed = 42))

Created on 2020-07-02 by the reprex package (v0.3.0)

like image 179
tjebo Avatar answered Nov 09 '22 21:11

tjebo


Try setting the seed when plotting:

set.seed(1); data %>%
  ggplot() +
  geom_jitter(aes(T.categ, sex, colour = status))

From the manual ?geom_jitter:

It adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets.

To have that "random variation" reproducible, we need to set set.seed when plotting.

like image 32
zx8754 Avatar answered Nov 09 '22 21:11

zx8754