Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why feature scaling in SVM?

I found that scaling in SVM (Support Vector Machine) problems really improve its performance. I have read this explanation:

The main advantage of scaling is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges.

Unfortunately this didn't help me. Can somebody provide a better explanation?

like image 930
Kevin Avatar asked Oct 06 '14 21:10

Kevin


People also ask

Do you need to scale features for SVM?

Because Support Vector Machine (SVM) optimization occurs by minimizing the decision vector w, the optimal hyperplane is influenced by the scale of the input features and it's therefore recommended that data be standardized (mean 0, var 1) prior to SVM model training.

What is the purpose of feature scaling?

Feature scaling is a method used to normalize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.

Why is SVM sensitive to feature scaling?

Yes, SVMs are sensitive to feature scaling as it takes input data to find the margins around hyperplanes and gets biased for the variance in high values.

What are advantages of feature scaling?

Specifically, in the case of Neural Networks Algorithms, feature scaling benefits optimization by: It makes the training faster. It prevents the optimization from getting stuck in local optima. It gives a better error surface shape.


1 Answers

Feature scaling is a general trick applied to optimization problems (not just SVM). The underline algorithm to solve the optimization problem of SVM is gradient descend. Andrew Ng has a great explanation in his coursera videos here.

I will illustrate the core ideas here (I borrow Andrew's slides). Suppose you have only two parameters and one of the parameters can take a relatively large range of values. Then the contour of the cost function can look like very tall and skinny ovals (see blue ovals below). Your gradients (the path of gradient is drawn in red) could take a long time and go back and forth to find the optimal solution.
enter image description here

Instead if your scaled your feature, the contour of the cost function might look like circles; then the gradient can take a much more straight path and achieve the optimal point much faster. enter image description here

like image 199
greeness Avatar answered Oct 05 '22 11:10

greeness