Log transform dependent variable for regression tree

Tags:

I have a dataset where I find that the dependent (target) variable has a skewed distribution - i.e. there are a few very large values and a long tail.

When I run the regression tree, one end-node is created for the large-valued observations and one end-node is created for majority of the other observations.

Would it be ok to log transform the dependent (target) variable and use it for regression tree analysis ? When I tried this, I get a different set of nodes and splits that seem to have a more even distribution of observations in each bucket. With log transformation, the Rsquare value for Predicted vs. Observed is also quite good. In other words, I seem to get better testing and validation performance with log transformation. Just want to make sure log transformation is an accepted way to run regression tree when the dependent variable has a skewed distribution.

Thanks !

632

asked Jan 30 '15 16:01

airjordan707

1 Answers

Yes. It is completely fine to apply log transformation on target variable when it has skewed distribution. That being said, you need to apply inverse function on top of the predicted values to get the actual predicted target value.

Moreover you have tested that by transforming you are getting better estimates on Rsquare error. I am assuming you have computed RSquare after inverting the log using exponent function.

For more details please refer, wiki link on data transformation.

Note that if your training data contains any negative target values, log transformation cannot be applied directly. You might have to apply some other functions which can accept negative values.

162

answered Oct 22 '22 06:10

Sandeep

Related questions
                            
                                Can TF/IDF take classes in account
                            
                                Defining a gradient with respect to a subtensor in Theano
                            
                                Numpy linear regression with regularization
                            
                                How to interpret `scipy.stats.kstest` and `ks_2samp` to evaluate `fit` of data to a distribution?
                            
                                Are modern CNN (convolutional neural network) as DetectNet rotate invariant?
                            
                                Getting 'ValueError: shapes not aligned' on SciKit Linear Regression
                            
                                Tensorflow estimator: average_loss vs loss
                            
                                Trainable sklearn StandardScaler for R
                            
                                is it possible to implement dynamic class weights in keras?
                            
                                How Transformer is Bidirectional - Machine Learning
                            
                                How to load the saved tokenizer from pretrained model
                            
                                Implementing PCA with Numpy
                            
                                What is tape-based autograd in Pytorch?
                            
                                Compiling Caffe C++ Classification Example
                            
                                Keras: How to feed input directly into other hidden layers of the neural net than the first?
                            
                                Probability prediction method of KNeighborsClassifier returns only 0 and 1
                            
                                Keras LSTM - why different results with "same" model & same weights?
                            
                                How do I use principal component analysis in supervised machine learning classification problems?
                            
                                How do I convert new data into the PCA components of my training data?
                            
                                Reinforcement learning algorithms for continuous states, discrete actions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Log transform dependent variable for regression tree

Tags:

machine-learning

regression

cross-validation

airjordan707

People also ask

1 Answers

Sandeep

Recent Activity

Donate For Us