'Probability' of a K-nearest neighbor like classification

Tags:

I've a small set of data points (around 10) in a 2D space, and each of them have a category label. I wish to classify a new data point based on the existing data point labels and also associate a 'probability' for belonging to any particular label class.

Is it appropriate to label the new point based on the label to its nearest neighbor( like a K-nearest neighbor, K=1)? For getting the probability I wish to permute all the labels and calculate all the minimum distance of the unknown point and the rest and finding the fraction of cases where the minimum distance is lesser or equal to the distance that was used to label it.

Thanks

713

asked Feb 08 '11 14:02

WoA

1 Answers

The Nearest Neighbour method is already using the Bayes theorem to estimate the probability using the points in a ball containing your chosen K points. There is no need to transform, as the number of points in the ball of K points belonging to each label divided by the total number of points in that ball already is an approximation of the posterior probability of that label. In other words:

P(label|z) = P(z|label)P(label) / P(z) = K(label)/K

This is obtained using the Bayes rule of probability on an estimated probability estimated using a subset of the data. In particular, using:

VP(x) = K/N (this gives you the probability of a point in a ball of volume V)

P(x) = K/NV (from above)

P(x=label) = K(label)/N(label)V (where K(label) and N(label) are the number of points in the ball of that given class and the number of points in the total samples of that class)

and

P(label) = N(label)/N.

Therefore, just pick a K, calculate the distances, count the points and by checking their labels and recounting you will have your probability.

141

answered Oct 10 '22 02:10

Stefio

Related questions
                            
                                Find previous hour and next hour in R
                            
                                "incorrect number of probabilities" error using sample()
                            
                                Is it possible to put numbers on top of a matplot histogram?
                            
                                What emails clients are being used out there?
                            
                                Calculating a moving average in F#
                            
                                Issues remoting to perfmon
                            
                                Is the bash function $RANDOM supposed to have an uniform distribution?
                            
                                How to find if the numbers are continuous in R?
                            
                                multivariate student t-distribution with python
                            
                                Scatterplot matrixes with boxplots for categorical data
                            
                                Is it possible to perform a parameter sensitivity analysis using python?
                            
                                Calculating Population Standard Deviation in R
                            
                                How to combine False positives and false negatives into one single measure
                            
                                How to compute residuals of a point process in python
                            
                                Friedman test unreplicated complete block design error
                            
                                How to draw probability density function in MatLab?
                            
                                See how many times my free iPhone app has been downloaded
                            
                                two whole texts similarity using levenshtein distance [closed]
                            
                                Remove values outside Loess curve limits
                            
                                R: How to remove outliers from a smoother in ggplot2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

'Probability' of a K-nearest neighbor like classification

Tags:

machine-learning

statistics

classification

cluster-analysis

WoA

People also ask

1 Answers

Stefio

Recent Activity

Donate For Us