Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Machine learning - normalizing features with no theoretical maximum value

What approach is the best one to normalize / standardize features that have no theoretical maximum value ?

for example a trend like a stock value that has always been between 0-1000$ doesn't mean it couldn't go further up, so what is the correct approach?

i thought about training a model on a higher maximum (ex. 2000 ),but it doesn't feel right, because no data would be available for the 1000-2000 range, and i think this would introduce bias

like image 808
Stormsson Avatar asked Oct 14 '17 11:10

Stormsson


People also ask

When should you not normalize data in machine learning?

For machine learning, every dataset does not require normalization. It is required only when features have different ranges. For example, consider a data set containing two features, age, and income(x2). Where age ranges from 0–100, while income ranges from 0–100,000 and higher.

How do you normalize features in machine learning?

Normalization techniques in Machine Learning The most widely used types of normalization in machine learning are: Min-Max Scaling – Subtract the minimum value from each column's highest value and divide by the range. Each new column has a minimum value of 0 and a maximum value of 1.

Do we need to normalize all features?

If one of the features has a broad range of values, the distance governs this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.

Why do we normalize features in machine learning?

The short answer is — it dramatically improves model accuracy. Normalization gives equal weights/importance to each variable so that no single variable steers model performance in one direction just because they are bigger numbers.


1 Answers

TL;DR: use z-scores, maybe take log, maybe take inverse logit, maybe don't normalize at all.

If you wish to normalize safely, use a monotonic mapping, e.g.:

To map (0, inf) into (-inf, inf), you can use y = log(x)

To map (-inf, inf) into (0, 1), you can use y = 1 / (1 + exp(-x)) (inverse logit)

To map (0, inf) into (0, 1), you can use y = x / (1 + x) (inverse logit after log)

If you don't care about bounds, use a linear mapping: y=(x - m) / s, where m is the mean of your feature, and s is its standard deviation. This is called standard scaling, or sometimes z-scoring.

The question you should have asked yourself: why normalize at all?. What are you going to do with your data? Use it as an input feature? Or use it as a target to predict?

For an input feature, leaving it not-normalized is OK, unless you do regularization on model coefficients (like Ridge or Lasso), which works best if all the coefficients are in the same scale (that is, after standard scaling).

For a target feature, leaving it non-normalized is sometimes also OK.

Additive models (like linear regression or gradient boosting) sometimes work better with symmetric distributions. Distributions of stock values (and money values in general) are often skewed to the right, so taking log makes them more convenient.

Finally, if you predict your feature with a neural net with sigmoid activation function, it is inherently bounded. In this case, you might wish the target to be bounded as well. To achieve this, you may use x / (1 + x) as a target: if x is always positive, this value will always be between 0 and 1, just like output of the neural net.

like image 160
David Dale Avatar answered Sep 19 '22 14:09

David Dale