My purpose is to reproduce this figure (Douglas A. Lind, William G Marchal, Samuel A. Wathen, Statistical Techniques in Business and Economics, McGraw-Hill, 17th edition) with ggplot2
(author: Hadley Wickham).
Here is my effort based on geom_point
and some ugly data preparation (see code further down):
How could I do that with geom_dotplot()
?
In my attempts I have encountered several problems: (1) map the default density produced by geom_dotplot to a count, (2) cut off the axis, (3) not have unexpected holes. I gave up and hacked geom_point()
instead.
I expected (and still hope) it would be as simple as
ggplot(data, aes(x,y)) + geom_dotplot(stat = "identity")
but no. So here's what I've tried and the output:
# Data
df <- structure(list(x = c(79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), y = c(1, 0, 0, 2, 1, 2, 7, 3, 7, 9, 11, 12, 15, 8, 10, 13, 11, 8, 9, 2, 3, 2, 1, 3, 0, 1, 1)), class = "data.frame", row.names = c(NA, -27L))
# dotplot based on geom_dotplot
geom_dots <- function(x, count, round = 10, breaks = NULL, ...) {
require(ggplot2)
n = sum(count) # total number of dots to be drawn
b = round*round(n/round) # prettify breaks
x = rep(x, count) # make x coordinates for dots
if (is.null(breaks)) breaks = seq(0, 1, b/4/n)
ggplot(data.frame(x = x), aes(x = x)) +
geom_dotplot(method = "histodot", ...) +
scale_y_continuous(breaks = breaks,
#limits = c(0, max(count)+1), # doesn't work
labels = breaks * n)
}
geom_dots(x = df$x, count = df$y)
# dotplot based on geom_point
ggplot_dot <- function(x, count, ...) {
require(ggplot2)
message("The count variable must be an integer")
count = as.integer(count) # make sure these are counts
n = sum(count) # total number of dots to be drawn
x = rep(x, count) # make x coordinates for dots
count = count[count > 0] # drop zero cases
y = integer(0) # initialize y coordinates for dots
for (i in seq_along(count))
y <- c(y, 1:(count[i])) # compute y coordinates
ggplot(data.frame(x = x, y = y), aes(x = x, y = y)) +
geom_point(...) # draw one dot per positive count
}
ggplot_dot(x = df$x, count = df$y,
size = 11, shape = 21, fill = "orange", color = "black") + theme_gray(base_size = 18)
# ggsave("dotplot.png")
ggsave("dotplot.png", width = 12, height = 5.9)
Brief random comment: With the geom_point()
solution, saving the plot involves tweaking the sizes just right to ensure that the dots are in contact (both the dot size and the plot height/width). With the geom_dotplot()
solution, I rounded the labels to make them prettier. Unfortunately I was not able to cut off the axis at about 100: using limits()
or coord_cartesian()
results in a rescaling of the entire plot and not a cut. Note also that to use geom_dotplot()
I created a vector of data based on the count, as I was unable to use the count variable directly (I expected stat="identity"
to do that, but I couldn't make it work).
Coincidentally, I've also spent the past day fighting with geom_dotplot()
and trying to make it show a count. I haven't figured out a way to make the y axis show actual numbers, but I have found a way to truncate the y axis. As you mentioned, coord_cartesian()
and limits
don't work, but coord_fixed()
does, since it enforces a ratio of x:y units:
library(tidyverse)
df <- structure(list(x = c(79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105), y = c(1, 0, 0, 2, 1, 2, 7, 3, 7, 9, 11, 12, 15, 8, 10, 13, 11, 8, 9, 2, 3, 2, 1, 3, 0, 1, 1)), class = "data.frame", row.names = c(NA, -27L))
df <- tidyr::uncount(df, y)
ggplot(df, aes(x)) +
geom_dotplot(method = 'histodot', binwidth = 1) +
scale_y_continuous(NULL, breaks = NULL) +
# Make this as high as the tallest column
coord_fixed(ratio = 15)
Using 15 as the ratio here works because the x-axis is also in the same units (i.e. single integers). If the x-axis is a percentage or log dollars or date or whatever, you have to tinker with the ratio until the y-axis is truncated enough.
Edited with method for combining plots
As I mentioned in a comment below, using patchwork to combine plots with coord_fixed()
doesn't work well. However, if you manually set the heights (or widths) of the combined plots to the same values as the ratio in coord_fixed()
and ensure that each plot has the same x axis, you can get psuedo-faceted plots
# Make a subset of df
df2 <- df %>% slice(1:25)
plot1 <- ggplot(df, aes(x)) +
geom_dotplot(method = 'histodot', binwidth = 1) +
scale_y_continuous(NULL, breaks = NULL) +
# Make this as high as the tallest column
# Make xlim the same on both plots
coord_fixed(ratio = 15, xlim = c(75, 110))
plot2 <- ggplot(df2, aes(x)) +
geom_dotplot(method = 'histodot', binwidth = 1) +
scale_y_continuous(NULL, breaks = NULL) +
coord_fixed(ratio = 7, xlim = c(75, 110))
# Combine both plots in a single column, with each sized incorrectly
library(patchwork)
plot1 + plot2 +
plot_layout(ncol = 1)
# Combine both plots in a single column, with each sized appropriately
library(patchwork)
plot1 + plot2 +
plot_layout(ncol = 1, heights = c(15, 7) / (15 + 7))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With