Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stop geom_density_ridges from showing non-existent tail values

When I use geom_density_ridges(), the plot often ends up showing long tails of values that don't exist in the data.

Here's an example:

library(tidyverse)
library(ggridges)

data("lincoln_weather")

# Remove all negative values for "Minimum Temperature"
d <- lincoln_weather[lincoln_weather$`Min Temperature [F]`>=0,]

ggplot(d, aes(`Min Temperature [F]`, Month)) +
  geom_density_ridges(rel_min_height=.01)

geom_ridgline As you can see, January, February, and December all show negative temperatures, but there are no negative values in the data at all.

Of course, I can add limits to the x-axis, but that doesn't solve the problem because it just truncates the existing erroneous density.

ggplot(d, aes(`Min Temperature [F]`, Month)) +
  geom_density_ridges(rel_min_height=.01) +
  xlim(0,80)

geom_ridgeline with axis limits Now the plot makes it look like there are zero values for January and February (there are none). It also makes it look like 0 degrees happened often in December, when in reality there was only 1 such day.

How can I fix this?

like image 462
John J. Avatar asked Apr 18 '18 18:04

John J.


2 Answers

One option is to use stat_density() instead of stat_density_ridges(). There are some things that stat_density() can't do, such as drawing vertical lines or overlaying points, but on the flip side it can do some things that stat_density_ridges() can't do, such as trimming the distributions to the data ranges.

# Remove all negative values for "Minimum Temperature"
d <- lincoln_weather[lincoln_weather$`Min Temperature [F]`>=0,]

ggplot(d, aes(`Min Temperature [F]`, Month, group = Month, height = ..density..)) +
  geom_density_ridges(stat = "density", trim = TRUE)

enter image description here

As an alternative, you could draw a point rug, maybe that serves your purpose as well or better:

ggplot(d, aes(`Min Temperature [F]`, Month)) +
  geom_density_ridges(rel_min_height = 0.01, jittered_points = TRUE,
                      position = position_points_jitter(width = 0.5, height = 0),
                      point_shape = "|", point_size = 2,
                      alpha = 0.7)

enter image description here

Note: those two approaches cannot currently be combined, that would require some modifications to the stat code.

like image 169
Claus Wilke Avatar answered Nov 17 '22 02:11

Claus Wilke


Well, turns out I should have just read the documentation more closely. The key part is:

"The ggridges package provides two main geoms, geom_ridgeline and geom_density_ridges. The former takes height values directly to draw ridgelines, and the latter first estimates data densities and then draws those using ridgelines."

There are multiple ways to handle this issue. Here is one:

ggplot(d, aes(`Min Temperature [F]`, Month, height=..density..)) +
  geom_density_ridges(stat = "binline", binwidth=1,
                      draw_baseline = F)

enter image description here

like image 42
John J. Avatar answered Nov 17 '22 03:11

John J.