This @camille code generates a nice pareto plot with ggplot.
library(tidyverse)
d <- tribble(
~ category, ~defect,
"price", 80,
"schedule", 27,
"supplier", 66,
"contact", 94,
"item", 33
) %>% arrange(desc(defect)) %>%
mutate(
cumsum = cumsum(defect),
freq = round(defect / sum(defect), 3),
cum_freq = cumsum(freq)
) %>%
mutate(category = as.factor(category) %>% fct_reorder(defect))
brks <- unique(d$cumsum)
ggplot(d, aes(x = fct_rev(category))) +
geom_col(aes(y = defect)) +
geom_point(aes(y = cumsum)) +
geom_line(aes(y = cumsum, group = 1)) +
scale_y_continuous(sec.axis = sec_axis(~. / max(d$cumsum), labels = scales::percent), breaks = brks)
It's almost perfect except I'd like to see the second y-axis break at the cumulative y-values. This can be achieved in base-R with the following code. But how do I do it in ggplot?
## Creating the d tribble
library(tidyverse)
d <- tribble(
~ category, ~defect,
"price", 80,
"schedule", 27,
"supplier", 66,
"contact", 94,
"item", 33
)
## Creating new columns
d <- arrange(d, desc(defect)) %>%
mutate(
cumsum = cumsum(defect),
freq = round(defect / sum(defect), 3),
cum_freq = cumsum(freq)
)
## Saving Parameters
def_par <- par()
## New margins
par(mar=c(5,5,4,5))
## bar plot, pc will hold x values for bars
pc = barplot(d$defect,
width = 1, space = 0.2, border = NA, axes = F,
ylim = c(0, 1.05 * max(d$cumsum, na.rm = T)),
ylab = "Cummulative Counts" , cex.names = 0.7,
names.arg = d$category,
main = "Pareto Chart (version 1)")
## Cumulative counts line
lines(pc, d$cumsum, type = "b", cex = 0.7, pch = 19, col="cyan4")
## Framing plot
box(col = "grey62")
## adding axes
axis(side = 2, at = c(0, d$cumsum), las = 1, col.axis = "grey62", col = "grey62", cex.axis = 0.8)
axis(side = 4, at = c(0, d$cumsum), labels = paste(c(0, round(d$cum_freq * 100)) ,"%",sep=""),
las = 1, col.axis = "cyan4", col = "cyan4", cex.axis = 0.8)
## restoring default paramenter
par(def_par)
Camille had some ideas but they still linger, "The more recent versions of ggplot2 allow for a secondary axis, but it needs to be based on a transformation of the primary axis. In this case, that means it should take the primary axis's values and divide by the maximum value to get a percentage.".
brks <- unique(d$cumsum)
brks2 <- unique(d$cumsum / max(d$cumsum))
ggplot(d, aes(x = fct_rev(category))) +
geom_col(aes(y = defect)) +
geom_point(aes(y = cumsum)) +
geom_line(aes(y = cumsum, group = 1)) +
scale_y_continuous(sec.axis = sec_axis(~. / max(d$cumsum), labels = scales::percent, breaks = brks2), breaks = brks)
The only improvement this makes over my previous code from the last question, and @Jack Brookes answer is that I eliminated the need for calculating the two sets of breaks outside of the ggplot
call. Instead, I just got the breaks for the cumulative raw numbers as unique(d$cumsum)
and the breaks for the cumulative frequencies as unique(d$cumfreq)
. On both of these, I tacked a 0 to the beginning, because otherwise there's no break placed at 0.
library(tidyverse)
library(scales)
d <- tribble(
~ category, ~defect,
"price", 80,
"schedule", 27,
"supplier", 66,
"contact", 94,
"item", 33
) %>% arrange(desc(defect)) %>%
mutate(
cumsum = cumsum(defect),
freq = round(defect / sum(defect), 3),
cum_freq = cumsum(freq)
) %>%
mutate(category = as.factor(category) %>% fct_reorder(defect))
ggplot(d, aes(x = fct_rev(category))) +
geom_col(aes(y = defect)) +
geom_point(aes(y = cumsum)) +
geom_line(aes(y = cumsum, group = 1)) +
scale_y_continuous(breaks = c(0, unique(d$cumsum)),
sec.axis = sec_axis(~. / max(d$cumsum), labels = scales::percent,
breaks = c(0, unique(d$cum_freq)))
) +
theme(panel.grid.minor = element_blank())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With