What approach is the best one to normalize / standardize features that have no theoretical maximum value ? for example a trend like a stock value that has always been between 0-1000$ doesn't mean it couldn't go further up, so what is the correct approach? i thought about training a model on a higher maximum (ex. 2000 ),but it doesn't feel right, because no data would be available for the 1000-2000 range, and i think this would introduce bias

TL;DR: use z-scores, maybe take log, maybe take inverse logit, maybe don't normalize at all. If you wish to normalize safely, use a monotonic mapping, e.g.: To map <code>(0, inf)</code> into <code>(-inf, inf)</code>, you can use <code>y = log(x)</code> To map <code>(-inf, inf)</code> into <code>(0, 1)</code>, you can use <code>y = 1 / (1 + exp(-x))</code> (inverse logit) To map <code>(0, inf)</code> into <code>(0, 1)</code>, you can use <code>y = x / (1 + x)</code> (inverse logit after log) If you don't care about bounds, use a linear mapping: <code>y=(x - m) / s</code>, where <code>m</code> is the mean of your feature, and <code>s</code> is its standard deviation. This is called standard scaling, or sometimes z-scoring. The question you should have asked yourself: why normalize at all?. What are you going to do with your data? Use it as an input feature? Or use it as a target to predict? For an input feature, leaving it not-normalized is OK, unless you do regularization on model coefficients (like Ridge or Lasso), which works best if all the coefficients are in the same scale (that is, after standard scaling). For a target feature, leaving it non-normalized is sometimes also OK. Additive models (like linear regression or gradient boosting) sometimes work better with symmetric distributions. Distributions of stock values (and money values in general) are often skewed to the right, so taking log makes them more convenient. Finally, if you predict your feature with a neural net with sigmoid activation function, it is inherently bounded. In this case, you might wish the target to be bounded as well. To achieve this, you may use <code>x / (1 + x)</code> as a target: if <code>x</code> is always positive, this value will always be between 0 and 1, just like output of the neural net.

Machine learning - normalizing features with no theoretical maximum value

1 Answers

TL;DR: use z-scores, maybe take log, maybe take inverse logit, maybe don't normalize at all.

If you wish to normalize safely, use a monotonic mapping, e.g.:

To map (0, inf) into (-inf, inf), you can use y = log(x)

To map (-inf, inf) into (0, 1), you can use y = 1 / (1 + exp(-x)) (inverse logit)

To map (0, inf) into (0, 1), you can use y = x / (1 + x) (inverse logit after log)

If you don't care about bounds, use a linear mapping: y=(x - m) / s, where m is the mean of your feature, and s is its standard deviation. This is called standard scaling, or sometimes z-scoring.

The question you should have asked yourself: why normalize at all?. What are you going to do with your data? Use it as an input feature? Or use it as a target to predict?

For an input feature, leaving it not-normalized is OK, unless you do regularization on model coefficients (like Ridge or Lasso), which works best if all the coefficients are in the same scale (that is, after standard scaling).

For a target feature, leaving it non-normalized is sometimes also OK.

Additive models (like linear regression or gradient boosting) sometimes work better with symmetric distributions. Distributions of stock values (and money values in general) are often skewed to the right, so taking log makes them more convenient.

Finally, if you predict your feature with a neural net with sigmoid activation function, it is inherently bounded. In this case, you might wish the target to be bounded as well. To achieve this, you may use x / (1 + x) as a target: if x is always positive, this value will always be between 0 and 1, just like output of the neural net.

160

answered Sep 19 '22 14:09

David Dale

Related questions
                            
                                How to obtain a confidence interval or a measure of prediction dispersion when using xgboost for classification?
                            
                                SVM - Difference between Energy vs Loss vs Regularization vs Cost function
                            
                                Keras RNN loss does not decrease over epoch
                            
                                Difference between LinearRegression() and Ridge(alpha=0)
                            
                                Image resizing method during preprocessing for neural network
                            
                                GridSearch with Keras Neural Networks
                            
                                Gradient calculation in Hamming loss for multi-label classification
                            
                                Dimension mismatch error in Spark ML
                            
                                How to save the encoded output in Keras
                            
                                tf.cond lowers the training speed
                            
                                How to convert Euclidean distance to range 0 and 1 like Cosine Similarity?
                            
                                Is it possible to get the objective function value during each training step?
                            
                                Binary Crossentropy to penalize all components of one-hot vector
                            
                                Is it possible to certify an AI-based solution for safety-critical systems? [closed]
                            
                                Least Squares method in practice
                            
                                Deep Learning an Imbalanced data set
                            
                                How to add a regression head after the fully connected layer in convolutional network using Tensorflow?
                            
                                Does CrossValidator in PySpark distribute the execution?
                            
                                Using keras tokenizer for new words not in training set
                            
                                Why is binary_crossentropy more accurate than categorical_crossentropy for multiclass classification in Keras?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Machine learning - normalizing features with no theoretical maximum value

Tags:

machine-learning

normalization

Stormsson

People also ask

1 Answers

David Dale

Recent Activity

Donate For Us