Intuition about the kernel trick in machine learning

Question

I have successfully implemented a kernel perceptron classifier, that uses an RBF kernel. I understand that the kernel trick maps features to a higher dimension so that a linear hyperplane can be constructed to separate the points. For example, if you have features (x1,x2) and map it to a 3-dimensional feature space you might get: K(x1,x2) = (x1^2, sqrt(x1)*x2, x2^2).

If you plug that into the perceptron decision function w'x+b = 0, you end up with: w1'x1^2 + w2'sqrt(x1)*x2 + w3'x2^2which gives you a circular decision boundary.

While the kernel trick itself is very intuitive, I am not able to understand the linear algebra aspect of this. Can someone help me understand how we are able to map all of these additional features without explicitly specifying them, using just the inner product?

Thanks!

Raff.Edward · Accepted Answer

Simple.

Give me the numeric result of (x+y)^10 for some values of x and y.

What would you rather do, "cheat" and sum x+y and then take that value to the 10'th power, or expand out the exact results writing out

x^10+10 x^9 y+45 x^8 y^2+120 x^7 y^3+210 x^6 y^4+252 x^5 y^5+210 x^4 y^6+120 x^3 y^7+45 x^2 y^8+10 x y^9+y^10

And then compute each term and then add them together? Clearly we can evaluate the dot product between degree 10 polynomials without explicitly forming them.

Valid kernels are dot products where we can "cheat" and compute the numeric result between two points without having to form their explicit feature values. There are many such possible kernels, though only a few have been getting used a lot on papers / practice.

Intuition about the kernel trick in machine learning

Tags:

machine-learning

statistics

linear-algebra

perceptron

rahulm

1 Answers

Raff.Edward

Recent Activity

Donate For Us

Intuition about the kernel trick in machine learning

Tags:

machine-learning

statistics

linear-algebra

perceptron

rahulm

1 Answers

Raff.Edward

Related questions

Recent Activity

Donate For Us