Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalizing feature values for SVM

I've been playing with some SVM implementations and I am wondering - what is the best way to normalize feature values to fit into one range? (from 0 to 1)

Let's suppose I have 3 features with values in ranges of:

  1. 3 - 5.

  2. 0.02 - 0.05

  3. 10-15.

How do I convert all of those values into range of [0,1]?

What If, during training, the highest value of feature number 1 that I will encounter is 5 and after I begin to use my model on much bigger datasets, I will stumble upon values as high as 7? Then in the converted range, it would exceed 1...

How do I normalize values during training to account for the possibility of "values in the wild" exceeding the highest(or lowest) values the model "seen" during training? How will the model react to that and how I make it work properly when that happens?

like image 502
user3010273 Avatar asked Dec 10 '13 22:12

user3010273


People also ask

Is SVM affected by feature scaling?

As a result, we see that feature scaling affects the SVM classifier outcome. Consequently, standardizing the feature values improves the classifier performance significantly.

Should you normalize features?

Normalization is useful when your data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as k-nearest neighbors and artificial neural networks.

How do you normalize a feature vector?

We can do this by taking each feature value, subtracting its mean (thereby shifting the mean to 0), and dividing by the standard deviation (normalising the distribution).

Is SVM sensitive to feature scaling?

Is SVM sensitive to the Feature Scaling? Yes, SVMs are sensitive to feature scaling as it takes input data to find the margins around hyperplanes and gets biased for the variance in high values.


1 Answers

Besides scaling to unit length method provided by Tim, standardization is most often used in machine learning field. Please note that when your test data comes, it makes more sense to use the mean value and standard deviation from your training samples to do this scaling. If you have a very large amount of training data, it is safe to assume they obey the normal distribution, so the possibility that new test data is out-of-range won't be that high. Refer to this post for more details.

like image 77
lennon310 Avatar answered Nov 13 '22 06:11

lennon310