Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Log scale density coloring for geom_hex

Tags:

r

ggplot2

The geom_hex geometry in ggplot2 colors hexagonal bins according to the number of points falling within them. This works pretty well for uniformly distributed data, but not so well if some regions are way more dense than others-- differences can get drowned out by the presence of a single very dense hexagon.

How can I make the density color scale use a log scale or some other kind of normalizing transformation?

like image 311
Matt Avatar asked Sep 21 '18 17:09

Matt


1 Answers

ggplot 3.0+ demystifies the calculation of summary metrics via the new stat() internal function. This makes it easier to modify the statistic being used to create the fill for the hexes. So for example:

Default count statistic

df <- data.frame(
  x = rnorm(1000),
  y = rnorm(1000)
)

plot.df <- ggplot(data = df, aes(x = x, y = y)) +
  geom_hex(aes(fill = stat(count)))
print(plot.df)

enter image description here

Log count statistic

plot.df.log <- ggplot(data = df, aes(x = x, y = y)) +
  geom_hex(aes(fill = stat(log(count))))
print(plot.df.log)

enter image description here

In place of log, you could do any arbitrary transformation you want, like cube root, etc.

Using cut

To avoid creating a scale with confusing values, you could use cut to establish sensible category boundaries, and convert these to a numeric scale which is labeled with the original count values:

plot.df.log.cut <- ggplot(data = df, aes(x = x, y = y)) +
  geom_hex(aes(fill = stat(cut(log(count), breaks = log(c(0, 1, 2, 4, Inf)), labels = F, right = T, include.lowest = T)))) +
  scale_fill_continuous(name = 'count', labels = c('1', '2', '4', '8+'))
print(plot.df.log.cut)

enter image description here

like image 138
jdobres Avatar answered Nov 11 '22 04:11

jdobres