Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does decreasing K in K-nearest-neighbours increase complexity?

In an extract from my textbook it says that reducing the value of K when running this algorithm actually increases the complexity as it has to run more “smoothing”.

Can anyone explain this to me?

My understanding is that in 1NN, you feed it your training set. You test on your testing set. Assume your testing set has one point in it. It finds the one point closest to it in the training set and returns the value of this.

Surely this is less complex than finding the 3 closest points in 3NN, adding their values and dividing by three?

What have I misunderstood or overlooked?

like image 950
Simon Kiely Avatar asked May 20 '14 18:05

Simon Kiely


People also ask

What happens as we decrease or increase K in K nearest neighbor classification?

The only parameter that can adjust the complexity of KNN is the number of neighbors k. The larger k is, the smoother the classification boundary. Or we can think of the complexity of KNN as lower when k increases.

Why increasing the K in a K nearest Neighbours classifier generally results in better accuracy?

If you increase k, the areas predicting each class will be more "smoothed", since it's the majority of the k-nearest neighbours which decide the class of any point.

How does K affect K nearest neighbor?

The number of data points that are taken into consideration is determined by the k value. Thus, the k value is the core of the algorithm. KNN classifier determines the class of a data point by the majority voting principle. If k is set to 5, the classes of 5 closest points are checked.

When you decrease the K the bias will be increases?

As k decreases, the bias also decreases, but the model is less stable. Formally, the prediction error (at a given target point x0) can be broken into three parts: the irreducible error, the bias squared, and variance.


1 Answers

I had the same moment of disbelief when reading that axiom ; a parameter of higher value that decreases complexity seems a bit counterintuitive at first.

To put an intuition on this, let's compare a 1-nearest-neighbour trained model, and a N>>1-nearest-neighbours one. Let's use a simplified 2D-plot (two-features dataset) with a binary classification (each "point" has a class, or label, of either A or B).

With the 1-nearest-neighbour model, each example of the training set is potentially the center of an area predicting class A or B, with most of its neighbors the center of an area predicting the other class. Your plot might look like one of those maps of ethnicity, language or religion in the regions of the world where they are deeply intertwined (Balkans or the Middle East comes to mind) : small patches of complex shapes and alternating colors, with no discernible logic, and thus "high complexity".

1-nearest neighbour

If you increase k, the areas predicting each class will be more "smoothed", since it's the majority of the k-nearest neighbours which decide the class of any point. Thus the areas will be of lesser number, larger sizes and probably simpler shapes, like the political maps of country borders in the same areas of the world. Thus "less complexity".

k-nearest neighbours

(Intuition and source from this course.)

like image 53
Manur Avatar answered Nov 09 '22 23:11

Manur