I want to plot the density of variable whose range is the following:
Min. :-1214813.0
1st Qu.: 1.0
Median : 40.0
Mean : 303.2
3rd Qu.: 166.0
Max. : 1623990.0
The linear plot of the density results in a tall column in range [0,1000], with two very long tails towards positive infinity and negative infinity. Hence, I'd like to transform the variable to a log scale, so that I can see what's going on around the mean. For example, I'm thinking of something like:
log_values = c( -log10(-values[values<0]), log10(values[values>0]))
which results in:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-6.085 0.699 1.708 1.286 2.272 6.211
The main problem with this is the fact that it doesn't include the 0
values.
Of course, I can shift all the values away from 0
with values[values>=0]+1
, but this would introduce some distortion in the data.
What would be an accepted and scientifically solid way of transforming this variable to the log scale?
Log transformation in R is accomplished by applying the log() function to vector, data-frame or other data set. Before the logarithm is applied, 1 is added to the base value to prevent applying a logarithm to a 0 value.
Log-Log Plot in Base R: To create a Log-Log plot in base R we pass log(data) as data argument instead of data in the plot() function. The log() function converts the data value into its logarithmic value. The log() function by default calculates the natural logarithms.
To create a boxplot with log of the variable in base R, we need to use log argument within the boxplot function but we need to carefully pass the Y-axis inside the function because the values of the boxplot are plotted on the Y-axis.
What you have is essentially what @James suggests. This is problematic for values in (-1,1), especially those close to the origin:
x <- seq(-2, 2, by=.01)
plot(x, sign(x)*log10(abs(x)), pch='.')
Something like this may help:
y <- c(-log10(-x[x<(-1)])-1, x[x >= -1 & x <= 1], log10(x[x>1])+1)
plot(x, y, pch='.')
This is continuous. One can force C^1 by using the interval (-1/log(10), 1/log(10)), which is found by solving d/dx log10(x) = 1 :
z <- c( -log10(-x[x<(-1/log(10))]) - 1/log(10)+log10(1/log(10)),
x[x >= -1/log(10) & x <= 1/log(10)],
log10(x[x>1/log(10)]) + 1/log(10)-log10(1/log(10))
)
plot(x, z, pch='.')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With