Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve the aspect of ggplot histograms with log scales and discrete values

Tags:

r

ggplot2

I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.

Please consider the following MWE

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()

which produces

enter image description here

and then

ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))

which probably is even worse

enter image description here

since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).

I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.

Is it possible to improve something?

EDIT:

This what happen when I applied Jaap solution to my real data

enter image description here

Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?

like image 831
CptNemo Avatar asked Jul 09 '14 06:07

CptNemo


People also ask

How do I change the Y axis to log scales ggplot2?

This can be done easily using the ggplot2 functions scale_x_continuous() and scale_y_continuous(), which make it possible to set log2 or log10 axis scale. An other possibility is the function scale_x_log10() and scale_y_log10(), which transform, respectively, the x and y axis scales into a log scale: base 10.

How is it possible to change the number of bins in a Ggplot histogram?

To change the number of bins in the histogram using the ggplot2 package library in the R Language, we use the bins argument of the geom_histogram() function. The bins argument of the geom_histogram() function to manually set the number of bars, cells, or bins the whole histogram will be divided into.

Which geometry function from package ggplot2 can be used to make histograms?

In a previous blog post, you learned how to make histograms with the hist() function. You can also make histograms by using ggplot2 , “a plotting system for R, based on the grammar of graphics” that was created by Hadley Wickham.


2 Answers

The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:

ggplot(data, aes(x=dist)) +
  geom_histogram(binwidth=10) +
  scale_x_continuous(expand=c(0,0)) +
  scale_y_continuous(expand=c(0.015,0)) +
  theme_bw()

gives: enter image description here


In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.

The following code:

library(ggplot2)
library(scales)

ggplot(data, aes(x=dist)) +
  stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
  scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
  scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
  theme_bw()

will give this result: enter image description here

like image 188
Jaap Avatar answered Sep 28 '22 09:09

Jaap


I am wondering, what if, y-axis is scaled instead of x-axis. It will results into few warnings wherever values are 0, but may serve your purpose.

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram() + scale_y_log10()

Basic Graph

Also you may want to display frequencies as data labels, since people might ignore the y-scale and it takes some time to realize that y scale is logarithmic.

ggplot(data, aes(x=dist)) + geom_histogram(fill = 'skyblue', color = 'grey30') + scale_y_log10() +
  stat_bin(geom="text", size=3.5, aes(label=..count.., y=0.8*(..count..)))

enter image description here

like image 31
Gaurav Singhal Avatar answered Sep 28 '22 09:09

Gaurav Singhal