How to improve the aspect of ggplot histograms with log scales and discrete values

Tags:

ggplot2

I am trying to improve the clarity and aspect of a histogram of discrete values which I need to represent with a log scale.

Please consider the following MWE

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram()

which produces

enter image description here

and then

ggplot(data, aes(x=dist)) + geom_line() + scale_x_log10(breaks=c(1,2,3,4,5,10,100))

which probably is even worse

enter image description here

since now it gives the impression that the something is missing between "1" and "2", and also is not totally clear which bar has value "1" (bar is on the right of the tick) and which bar has value "2" (bar is on the left of the tick).

I understand that technically ggplot provides the "right" visual answer for a log scale. Yet as observer I have some problem in understanding it.

Is it possible to improve something?

EDIT:

This what happen when I applied Jaap solution to my real data

enter image description here

Where do the dips between x=0 and x=1 and between x=1 and x=2 come from? My value are discrete, but then why the plot is also mapping x=1.5 and x=2.5?

831

asked Jul 09 '14 06:07

CptNemo

2 Answers

The first thing that comes to mind, is playing with the binwidth. But that doesn't give a great solution either:

ggplot(data, aes(x=dist)) +
  geom_histogram(binwidth=10) +
  scale_x_continuous(expand=c(0,0)) +
  scale_y_continuous(expand=c(0.015,0)) +
  theme_bw()

gives: enter image description here

In this case it is probably better to use a density plot. However, when you use scale_x_log10 you will get a warning message (Removed 524 rows containing non-finite values (stat_density)). This can be resolved by using a log plus one transformation.

The following code:

library(ggplot2)
library(scales)

ggplot(data, aes(x=dist)) +
  stat_density(aes(y=..count..), color="black", fill="blue", alpha=0.3) +
  scale_x_continuous(breaks=c(0,1,2,3,4,5,10,30,100,300,1000), trans="log1p", expand=c(0,0)) +
  scale_y_continuous(breaks=c(0,125,250,375,500,625,750), expand=c(0,0)) +
  theme_bw()

will give this result: enter image description here

188

answered Sep 28 '22 09:09

Jaap

I am wondering, what if, y-axis is scaled instead of x-axis. It will results into few warnings wherever values are 0, but may serve your purpose.

set.seed(99)
data <- data.frame(dist = as.integer(rlnorm(1000, sdlog = 2)))
class(data$dist)
ggplot(data, aes(x=dist)) + geom_histogram() + scale_y_log10()

Basic Graph

Also you may want to display frequencies as data labels, since people might ignore the y-scale and it takes some time to realize that y scale is logarithmic.

ggplot(data, aes(x=dist)) + geom_histogram(fill = 'skyblue', color = 'grey30') + scale_y_log10() +
  stat_bin(geom="text", size=3.5, aes(label=..count.., y=0.8*(..count..)))

enter image description here

answered Sep 28 '22 09:09

Gaurav Singhal

Related questions
                            
                                Extract time (HMS) from lubridate date time object?
                            
                                How can I auto-number math equations in RMarkdown?
                            
                                Use recode to mutate across multiple columns using named list of named vectors
                            
                                Highlight (shade) plot background in specific time range
                            
                                Calculating all distances between one point and a group of points efficiently in R
                            
                                How can I suppress the line numbers output using R CMD BATCH?
                            
                                fast sampling in R
                            
                                Logarithmic y-axis Tick Marks in R plot() or ggplot2()
                            
                                Re-arrange multiple columns in a data set into one column using R
                            
                                Why does evaluating an expression in system.time() make variables available in global environment?
                            
                                R: How do I use coord_cartesian on facet_grid with free-ranging axis
                            
                                How to create a matrix from vector returned by rep() function?
                            
                                python's scipy.stats.ranksums vs. R's wilcox.test
                            
                                Find the index of the column in data frame that contains the string as value
                            
                                "scale" or "ruler" type plot in r
                            
                                Using an expression in plot text - Printing the value of a variable rather than its name
                            
                                Update a data frame in shiny server.R without restarting the App
                            
                                Which algorithm used by the rnorm function
                            
                                Identify duplicates and mark first occurrence and all others
                            
                                data.table operations by column name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to improve the aspect of ggplot histograms with log scales and discrete values

Tags:

r

ggplot2

CptNemo

People also ask

2 Answers

Jaap

Gaurav Singhal

Recent Activity

Donate For Us