Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transforming variable density on log scale with R

Tags:

r

scale

logarithm

I want to plot the density of variable whose range is the following:

 Min.   :-1214813.0  
 1st Qu.:       1.0  
 Median :      40.0  
 Mean   :     303.2  
 3rd Qu.:     166.0  
 Max.   : 1623990.0

The linear plot of the density results in a tall column in range [0,1000], with two very long tails towards positive infinity and negative infinity. Hence, I'd like to transform the variable to a log scale, so that I can see what's going on around the mean. For example, I'm thinking of something like:

log_values = c( -log10(-values[values<0]), log10(values[values>0]))

which results in:

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-6.085   0.699   1.708   1.286   2.272   6.211 

The main problem with this is the fact that it doesn't include the 0 values. Of course, I can shift all the values away from 0with values[values>=0]+1, but this would introduce some distortion in the data.

What would be an accepted and scientifically solid way of transforming this variable to the log scale?

like image 914
Mulone Avatar asked Dec 23 '12 10:12

Mulone


People also ask

How do you convert data to a log scale in R?

Log transformation in R is accomplished by applying the log() function to vector, data-frame or other data set. Before the logarithm is applied, 1 is added to the base value to prevent applying a logarithm to a 0 value.

How do you plot a log scale in R?

Log-Log Plot in Base R: To create a Log-Log plot in base R we pass log(data) as data argument instead of data in the plot() function. The log() function converts the data value into its logarithmic value. The log() function by default calculates the natural logarithms.

How do you log a Boxplot in R?

To create a boxplot with log of the variable in base R, we need to use log argument within the boxplot function but we need to carefully pass the Y-axis inside the function because the values of the boxplot are plotted on the Y-axis.


1 Answers

What you have is essentially what @James suggests. This is problematic for values in (-1,1), especially those close to the origin:

x <- seq(-2, 2, by=.01)
plot(x, sign(x)*log10(abs(x)), pch='.')

enter image description here

Something like this may help:

y <- c(-log10(-x[x<(-1)])-1, x[x >= -1 & x <= 1], log10(x[x>1])+1)

plot(x, y, pch='.')

enter image description here

This is continuous. One can force C^1 by using the interval (-1/log(10), 1/log(10)), which is found by solving d/dx log10(x) = 1 :

z <- c( -log10(-x[x<(-1/log(10))]) - 1/log(10)+log10(1/log(10)),
         x[x >= -1/log(10) & x <= 1/log(10)],
         log10(x[x>1/log(10)]) + 1/log(10)-log10(1/log(10))
       )
plot(x, z, pch='.')

enter image description here

like image 110
Matthew Lundberg Avatar answered Sep 23 '22 21:09

Matthew Lundberg