When I use geom_density_ridges()
, the plot often ends up showing long tails of values that don't exist in the data.
Here's an example:
library(tidyverse)
library(ggridges)
data("lincoln_weather")
# Remove all negative values for "Minimum Temperature"
d <- lincoln_weather[lincoln_weather$`Min Temperature [F]`>=0,]
ggplot(d, aes(`Min Temperature [F]`, Month)) +
geom_density_ridges(rel_min_height=.01)
As you can see, January, February, and December all show negative temperatures, but there are no negative values in the data at all.
Of course, I can add limits to the x-axis, but that doesn't solve the problem because it just truncates the existing erroneous density.
ggplot(d, aes(`Min Temperature [F]`, Month)) +
geom_density_ridges(rel_min_height=.01) +
xlim(0,80)
Now the plot makes it look like there are zero values for January and February (there are none). It also makes it look like 0 degrees happened often in December, when in reality there was only 1 such day.
How can I fix this?
One option is to use stat_density()
instead of stat_density_ridges()
. There are some things that stat_density()
can't do, such as drawing vertical lines or overlaying points, but on the flip side it can do some things that stat_density_ridges()
can't do, such as trimming the distributions to the data ranges.
# Remove all negative values for "Minimum Temperature"
d <- lincoln_weather[lincoln_weather$`Min Temperature [F]`>=0,]
ggplot(d, aes(`Min Temperature [F]`, Month, group = Month, height = ..density..)) +
geom_density_ridges(stat = "density", trim = TRUE)
As an alternative, you could draw a point rug, maybe that serves your purpose as well or better:
ggplot(d, aes(`Min Temperature [F]`, Month)) +
geom_density_ridges(rel_min_height = 0.01, jittered_points = TRUE,
position = position_points_jitter(width = 0.5, height = 0),
point_shape = "|", point_size = 2,
alpha = 0.7)
Note: those two approaches cannot currently be combined, that would require some modifications to the stat code.
Well, turns out I should have just read the documentation more closely. The key part is:
"The ggridges package provides two main geoms, geom_ridgeline and geom_density_ridges. The former takes height values directly to draw ridgelines, and the latter first estimates data densities and then draws those using ridgelines."
There are multiple ways to handle this issue. Here is one:
ggplot(d, aes(`Min Temperature [F]`, Month, height=..density..)) +
geom_density_ridges(stat = "binline", binwidth=1,
draw_baseline = F)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With