Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Optimal Lambda for Box-Cox Transform in R

I am trying to transform data in a vector in R.

This is not for linear regression so I don't have a predictor and response relationship. I am simply using a model that will improve accuracy by normalizing my data. (hence I can't use the boxcox function since it only works with linear models).

the data I'm trying to transform is:

vect
 [1]  99.64  49.71 246.84  96.17  16.67 352.00 421.25  81.77 105.00  37.85

I have looked at this post.

It was not clear on what was being done and how the optimize function is being used but I did manage to modify the function to create a function that I would like to minimize.

xskew <- function(data,par) {
abs(skewness((data^par-1)/par)) }

I would like to input a sequence of values for lambda (perhaps between 0.5 and 1 with jumps of 0.01) and find which one of those values minimizes xskew for my dataset.

I have tried to do this with the optim function but with no luck so I don't think this might be the right function for me. How do I perform this calculation?

edit: I would like something along the lines of:

 x <- seq(0.51,0.99,by=0.01)
 which(xskew(vect,x) < 0.05)

So perhaps I would find a value under some threshold. This code obviously produces an error.

like image 759
Michal Avatar asked Oct 28 '14 20:10

Michal


People also ask

What is lambda in Box-Cox transformation?

The Box-Cox linearity plot is a plot of the correlation between Y and the transformed X for given values of \lambda . That is, \lambda is the coordinate for the horizontal axis variable and the value of the correlation between Y and the transformed X is the coordinate for the vertical axis of the plot.

What is Box-Cox transformation in time series?

The Box-Cox transformation is a family of power transformations indexed by a parameter lambda. Whenever you use it the parameter needs to be estimated from the data. In time series the process could have a non-constant variance. if the variance changes with time the process is nonstationary.


2 Answers

Note that y~1 counts as a linear model in R, so you can use the boxcox function from MASS:

tmp <- exp(rnorm(10))
out <- boxcox(lm(tmp~1))
range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])

I think that the most important part of that function is not that it finds a "best" lambda, but that it finds the confidence interval for lambda, then encourages you to think about what the different transformations mean and combine that with the science behind the data. If the "best" lambda for your data is 0.41, but the interval contains 0.5 and there is scientific reasoning why a square root transform makes sense, then why use 0.41 instead of 0.5?

like image 84
Greg Snow Avatar answered Oct 24 '22 03:10

Greg Snow


For applying box cox transformation on vector, use forecast package in r:

library(forecast)
# to find optimal lambda
lambda = BoxCox.lambda( vector )
# now to transform vector
trans.vector = BoxCox( vector, lambda)
like image 29
TheMI Avatar answered Oct 24 '22 02:10

TheMI