Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting probability as 0 or 1 in KNN (predict_proba)

I was using KNN from sklearn and predicted the labels using predict_proba. I was expecting the values in the range of 0 to 1 since it tells the probability for a particular class. But I am only getting 0 & 1.

I have put large k values also but to no gain. Though I have only 1000 samples with features around 200 and the matrix is largely sparse.

Can anybody tell me what could be the solution here?

like image 899
Gagan Avatar asked Jan 31 '17 11:01

Gagan


People also ask

What does predict_proba () return?

The predict_proba() method In the context of classification tasks, some sklearn estimators also implement the predict_proba method that returns the class probabilities for each data point.

Can KNN predict probability?

The k-nearest neighbor, or KNN, algorithm is another nonlinear machine learning algorithm that predicts a class label directly and must be modified to produce a probability-like score. This often involves using the distribution of class labels in the neighborhood.

How is predict_proba calculated?

How is predict_proba calculated? The predict_proba() returns the number of votes for each class, divided by the number of trees in the forest. Your precision is exactly 1/n_estimators. If you want to see variation at the 5th digit, you will need 10**5 = 100,000 estimators, which is excessive.

What is predict_proba in random forest?

predict_proba(X)[source] Predict class probabilities for X. The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.


2 Answers

sklearn.neighbors.KNeighborsClassifier(n_neighbors=**k**)

The reason why you're getting only 0 & 1 is because of the n_neighbors = k parameter. If k value is set to 1, then you will get 0 or 1. If it's set to 2, you will get 0, 0.5 or 1. And if it's set to 3, then the probability outputs will be 0, 0.333, 0.666, or 1.

Also note that probability values are essentially meaningless in KNN. The algorithm is based on similarity and distance.

like image 166
numb3rs Avatar answered Jan 03 '23 15:01

numb3rs


The reason might be lack of variety of data in training and test sets.

If the features of a sample may only exist in a particular class and its features don't exist in any sample of other classes in training set, then that sample will be predicted to belong that class with probability of 100% (1) and 0% (0) for other classes. Otherwise; let say you have 2 classes and test a sample like knn.predict_proba(sample) and expect some result like [[0.47, 0.53]] The result will yield 1 in total in either way.

If thats the case, try generating your own test sample that has features from more than one classes objects in training set.

like image 24
Bilal Dadanlar Avatar answered Jan 03 '23 17:01

Bilal Dadanlar