I am trying to transform data in a vector in R.
This is not for linear regression so I don't have a predictor and response relationship. I am simply using a model that will improve accuracy by normalizing my data. (hence I can't use the boxcox function since it only works with linear models).
the data I'm trying to transform is:
vect
[1] 99.64 49.71 246.84 96.17 16.67 352.00 421.25 81.77 105.00 37.85
I have looked at this post.
It was not clear on what was being done and how the optimize function is being used but I did manage to modify the function to create a function that I would like to minimize.
xskew <- function(data,par) {
abs(skewness((data^par-1)/par)) }
I would like to input a sequence of values for lambda (perhaps between 0.5 and 1 with jumps of 0.01) and find which one of those values minimizes xskew for my dataset.
I have tried to do this with the optim function but with no luck so I don't think this might be the right function for me. How do I perform this calculation?
edit: I would like something along the lines of:
x <- seq(0.51,0.99,by=0.01)
which(xskew(vect,x) < 0.05)
So perhaps I would find a value under some threshold. This code obviously produces an error.
The Box-Cox linearity plot is a plot of the correlation between Y and the transformed X for given values of \lambda . That is, \lambda is the coordinate for the horizontal axis variable and the value of the correlation between Y and the transformed X is the coordinate for the vertical axis of the plot.
The Box-Cox transformation is a family of power transformations indexed by a parameter lambda. Whenever you use it the parameter needs to be estimated from the data. In time series the process could have a non-constant variance. if the variance changes with time the process is nonstationary.
Note that y~1
counts as a linear model in R, so you can use the boxcox
function from MASS:
tmp <- exp(rnorm(10))
out <- boxcox(lm(tmp~1))
range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])
I think that the most important part of that function is not that it finds a "best" lambda, but that it finds the confidence interval for lambda, then encourages you to think about what the different transformations mean and combine that with the science behind the data. If the "best" lambda for your data is 0.41, but the interval contains 0.5 and there is scientific reasoning why a square root transform makes sense, then why use 0.41 instead of 0.5?
For applying box cox transformation on vector, use forecast package in r:
library(forecast)
# to find optimal lambda
lambda = BoxCox.lambda( vector )
# now to transform vector
trans.vector = BoxCox( vector, lambda)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With