Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'Probability' of a K-nearest neighbor like classification

I've a small set of data points (around 10) in a 2D space, and each of them have a category label. I wish to classify a new data point based on the existing data point labels and also associate a 'probability' for belonging to any particular label class.

Is it appropriate to label the new point based on the label to its nearest neighbor( like a K-nearest neighbor, K=1)? For getting the probability I wish to permute all the labels and calculate all the minimum distance of the unknown point and the rest and finding the fraction of cases where the minimum distance is lesser or equal to the distance that was used to label it.

Thanks

like image 713
WoA Avatar asked Feb 08 '11 14:02

WoA


People also ask

How do you calculate probability in KNN?

P(label) = N(label)/N. Therefore, just pick a K, calculate the distances, count the points and by checking their labels and recounting you will have your probability.

Is K nearest neighbor a classification algorithm?

The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point.

What does K in K nearest neighbors classification mean?

'k' in KNN is a parameter that refers to the number of nearest neighbours to include in the majority of the voting process.

What is the nearest neighbor classification?

Definition. Nearest neighbor classification is a machine learning method that aims at labeling previously unseen query objects while distinguishing two or more destination classes. As any classifier, in general, it requires some training data with given labels and, thus, is an instance of supervised learning.


1 Answers

The Nearest Neighbour method is already using the Bayes theorem to estimate the probability using the points in a ball containing your chosen K points. There is no need to transform, as the number of points in the ball of K points belonging to each label divided by the total number of points in that ball already is an approximation of the posterior probability of that label. In other words:

P(label|z) = P(z|label)P(label) / P(z) = K(label)/K

This is obtained using the Bayes rule of probability on an estimated probability estimated using a subset of the data. In particular, using:

VP(x) = K/N (this gives you the probability of a point in a ball of volume V)

P(x) = K/NV (from above)

P(x=label) = K(label)/N(label)V (where K(label) and N(label) are the number of points in the ball of that given class and the number of points in the total samples of that class)

and

P(label) = N(label)/N.

Therefore, just pick a K, calculate the distances, count the points and by checking their labels and recounting you will have your probability.

like image 141
Stefio Avatar answered Oct 10 '22 02:10

Stefio