Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Histogram with "negative" logarithmic scale in R

I have a dataset with some outliers, such as the following

x <- rnorm(1000,0,20)
x <- c(x, 500, -500)

If we plot this on a linear x axis scale at this we see

histogram(x)

non log x-axis

I worked out a nice way to put it on a log scale using this useful thread: how to use a log scale for y-axis of histogram in R? :

mat <- data.frame(x)
ggplot(ee, aes(x = xx)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10()

log x-axis

However, I would like the x axis labels from this 2nd example to match that of the first example, except with a kind of "negative log" - i.e. first tick (moving from the centre to the left) could be -1, then the next could be -10, the next -100, but all equidistant. Does that make sense?

like image 671
Jim Bo Avatar asked Jan 24 '13 15:01

Jim Bo


People also ask

How do you graph negative values on a logarithmic scale?

Bottom line: A logarithmic axis can only plot positive values. There simply is no way to put negative values or zero on a logarithmic axis.

What is log scale in histogram?

A logarithmic axis compresses the range in a non-linear fashion, which means that variable width bins have to be used for histograms and the y-axis represents density (not a count). Taking logs and using the result to plot a histogram usually produces a curve having a distorted shape, not twin peaks.

Why use a logarithmic scale on a graph?

There are two main reasons to use logarithmic scales in charts and graphs. The first is to respond to skewness towards large values; i.e., cases in which one or a few points are much larger than the bulk of the data. The second is to show percent change or multiplicative factors.


1 Answers

I am not sure I understand your goal, but when you want a log-like transformation yet have zeroes or negative values, the inverse hyperbolic sine transformation asinh() is often a good option. It is log-like for large values and is defined for all real values. See Rob Hyndman's blog and this question on stats.stackexchange.com for discussion, details, and other options.

If this is an acceptable approach, you can create a custom scale for ggplot. The code below demonstrates how to create and use a custom scale (with custom breaks), along with a visualization of the asinh() transformation.

library(ggplot2)
library(scales)

limits <- 100
step <- 0.005
demo <- data.frame(x=seq(from=-1*limits,to=limits,by=step))

asinh_trans <- function(){
  trans_new(name = 'asinh', transform = function(x) asinh(x), 
            inverse = function(x) sinh(x))
}

ggplot(demo,aes(x,x))+geom_point(size=2)+
     scale_y_continuous(trans = 'asinh',breaks=c(-100,-50,-10,-1,0,1,10,50,100))+
     theme_bw()

enter image description here

ggplot(demo,aes(x,x))+geom_point(size=2)+
     scale_x_continuous(trans = 'asinh',breaks=c(0,1,10,50,100))+
     scale_y_log10(breaks=c(0,1,10,50,100))+ # zero won't plot
     xlab("asinh() scale")+ylab("log10 scale")+
     theme_bw()

enter image description here

like image 198
MattBagg Avatar answered Oct 14 '22 16:10

MattBagg