I want to plot the density of variable whose range is the following: <pre class="prettyprint"><code> Min. :-1214813.0 1st Qu.: 1.0 Median : 40.0 Mean : 303.2 3rd Qu.: 166.0 Max. : 1623990.0 </code></pre> The linear plot of the density results in a tall column in range [0,1000], with two very long tails towards positive infinity and negative infinity. Hence, I'd like to transform the variable to a log scale, so that I can see what's going on around the mean. For example, I'm thinking of something like: <pre class="prettyprint"><code>log_values = c( -log10(-values[values<0]), log10(values[values>0])) </code></pre> which results in: <pre class="prettyprint"><code>Min. 1st Qu. Median Mean 3rd Qu. Max. -6.085 0.699 1.708 1.286 2.272 6.211 </code></pre> The main problem with this is the fact that it doesn't include the <code>0</code> values. Of course, I can shift all the values away from <code>0</code>with <code>values[values>=0]+1</code>, but this would introduce some distortion in the data. What would be an accepted and scientifically solid way of transforming this variable to the log scale?

What you have is essentially what @James suggests. This is problematic for values in (-1,1), especially those close to the origin: <pre class="prettyprint"><code>x <- seq(-2, 2, by=.01) plot(x, sign(x)*log10(abs(x)), pch='.') </code></pre> <img src="https://i.stack.imgur.com/zNsKb.png" alt="enter image description here"> Something like this may help: <pre class="prettyprint"><code>y <- c(-log10(-x[x<(-1)])-1, x[x >= -1 & x <= 1], log10(x[x>1])+1) plot(x, y, pch='.') </code></pre> <img src="https://i.stack.imgur.com/0mJ4R.png" alt="enter image description here"> This is continuous. One can force C^1 by using the interval (-1/log(10), 1/log(10)), which is found by solving d/dx log10(x) = 1 : <pre class="prettyprint"><code>z <- c( -log10(-x[x<(-1/log(10))]) - 1/log(10)+log10(1/log(10)), x[x >= -1/log(10) & x <= 1/log(10)], log10(x[x>1/log(10)]) + 1/log(10)-log10(1/log(10)) ) plot(x, z, pch='.') </code></pre> <img src="https://i.stack.imgur.com/JIWps.png" alt="enter image description here">

Transforming variable density on log scale with R

Tags:

r

scale

logarithm

I want to plot the density of variable whose range is the following:

 Min.   :-1214813.0  
 1st Qu.:       1.0  
 Median :      40.0  
 Mean   :     303.2  
 3rd Qu.:     166.0  
 Max.   : 1623990.0

The linear plot of the density results in a tall column in range [0,1000], with two very long tails towards positive infinity and negative infinity. Hence, I'd like to transform the variable to a log scale, so that I can see what's going on around the mean. For example, I'm thinking of something like:

log_values = c( -log10(-values[values<0]), log10(values[values>0]))

which results in:

Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-6.085   0.699   1.708   1.286   2.272   6.211

The main problem with this is the fact that it doesn't include the 0 values. Of course, I can shift all the values away from 0with values[values>=0]+1, but this would introduce some distortion in the data.

What would be an accepted and scientifically solid way of transforming this variable to the log scale?

914

asked Dec 23 '12 10:12

Mulone

1 Answers

What you have is essentially what @James suggests. This is problematic for values in (-1,1), especially those close to the origin:

x <- seq(-2, 2, by=.01)
plot(x, sign(x)*log10(abs(x)), pch='.')

enter image description here

Something like this may help:

y <- c(-log10(-x[x<(-1)])-1, x[x >= -1 & x <= 1], log10(x[x>1])+1)

plot(x, y, pch='.')

enter image description here

This is continuous. One can force C^1 by using the interval (-1/log(10), 1/log(10)), which is found by solving d/dx log10(x) = 1 :

z <- c( -log10(-x[x<(-1/log(10))]) - 1/log(10)+log10(1/log(10)),
         x[x >= -1/log(10) & x <= 1/log(10)],
         log10(x[x>1/log(10)]) + 1/log(10)-log10(1/log(10))
       )
plot(x, z, pch='.')

enter image description here

110

answered Sep 23 '22 21:09

Matthew Lundberg

Related questions
                            
                                R: caching/memoise for environments
                            
                                remove columns with NAs from all dataframes in list
                            
                                In R, how to get the whole command line into the sys.call() of a binary operator?
                            
                                How to delete a slot of an element in a list in R with lappy
                            
                                Reading sdmx-xml files into a dataframe in R
                            
                                R: replacing NA with value of closest point
                            
                                using k-NN in R with categorical values
                            
                                Why sometimes i cant set a class definition as slot in a s4 class? [closed]
                            
                                Will just installing this package speed up R?
                            
                                Combining or merging workspaces in R and general workspace management
                            
                                Classification with naiveBayes (e1071) does not work ($levels returns NULL)
                            
                                Exclude rows with certain time of day
                            
                                Query using geom_bar() of ggplot2 - R
                            
                                pdf device and font family "Arial" / Or: Change font name (not font) in PDF
                            
                                How to perform 10 fold cross validation with LibSVM in R?
                            
                                contrasts in anova
                            
                                suffixes in xts merge in R [closed]
                            
                                Efficient way to calculate grid quadrants a line passes through
                            
                                How to change matrix column type in R
                            
                                Calculate ranks for each group

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Transforming variable density on log scale with R

Tags:

r

scale

logarithm

Mulone

People also ask

1 Answers

Matthew Lundberg

Recent Activity

Donate For Us