In an extract from my textbook it says that reducing the value of K
when running this algorithm actually increases the complexity as it has to run more “smoothing”.
Can anyone explain this to me?
My understanding is that in 1NN
, you feed it your training set. You test on your testing set. Assume your testing set has one point in it. It finds the one point closest to it in the training set and returns the value of this.
Surely this is less complex than finding the 3 closest points in 3NN
, adding their values and dividing by three?
What have I misunderstood or overlooked?
The only parameter that can adjust the complexity of KNN is the number of neighbors k. The larger k is, the smoother the classification boundary. Or we can think of the complexity of KNN as lower when k increases.
If you increase k, the areas predicting each class will be more "smoothed", since it's the majority of the k-nearest neighbours which decide the class of any point.
The number of data points that are taken into consideration is determined by the k value. Thus, the k value is the core of the algorithm. KNN classifier determines the class of a data point by the majority voting principle. If k is set to 5, the classes of 5 closest points are checked.
As k decreases, the bias also decreases, but the model is less stable. Formally, the prediction error (at a given target point x0) can be broken into three parts: the irreducible error, the bias squared, and variance.
I had the same moment of disbelief when reading that axiom ; a parameter of higher value that decreases complexity seems a bit counterintuitive at first.
To put an intuition on this, let's compare a 1-nearest-neighbour trained model, and a N>>1-nearest-neighbours one. Let's use a simplified 2D-plot (two-features dataset) with a binary classification (each "point" has a class, or label, of either A or B).
With the 1-nearest-neighbour model, each example of the training set is potentially the center of an area predicting class A or B, with most of its neighbors the center of an area predicting the other class. Your plot might look like one of those maps of ethnicity, language or religion in the regions of the world where they are deeply intertwined (Balkans or the Middle East comes to mind) : small patches of complex shapes and alternating colors, with no discernible logic, and thus "high complexity".
If you increase k, the areas predicting each class will be more "smoothed", since it's the majority of the k-nearest neighbours which decide the class of any point. Thus the areas will be of lesser number, larger sizes and probably simpler shapes, like the political maps of country borders in the same areas of the world. Thus "less complexity".
(Intuition and source from this course.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With