Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to limit prediction probability to one class

When using something like this

clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(X,y)
predictions = clf.predict_proba(X_test)

How to limit prediction only to one class? That is needed for the performance reasons, for instance, when I have thousand of classes, but only interested whether one particular class has a high probability.

like image 597
Arty Avatar asked Nov 26 '25 00:11

Arty


1 Answers

Sklearn does not implement it, you wil have to write some kind of wrapper, for example - you can extend the KNeighborsClassifier class and overload the predict_proba method.

According to the source code

 def predict_proba(self, X):
        """Return probability estimates for the test data X.

        Parameters
        ----------
        X : array, shape = (n_samples, n_features)
            A 2-D array representing the test points.

        Returns
        -------
        p : array of shape = [n_samples, n_classes], or a list of n_outputs
            of such arrays if n_outputs > 1.
            The class probabilities of the input samples. Classes are ordered
            by lexicographic order.
        """
        X = atleast2d_or_csr(X)

        neigh_dist, neigh_ind = self.kneighbors(X)

        classes_ = self.classes_
        _y = self._y
        if not self.outputs_2d_:
            _y = self._y.reshape((-1, 1))
            classes_ = [self.classes_]

        n_samples = X.shape[0]

        weights = _get_weights(neigh_dist, self.weights)
        if weights is None:
            weights = np.ones_like(neigh_ind)

        all_rows = np.arange(X.shape[0])
        probabilities = []
        for k, classes_k in enumerate(classes_):
            pred_labels = _y[:, k][neigh_ind]
            proba_k = np.zeros((n_samples, classes_k.size))

            # a simple ':' index doesn't work right
            for i, idx in enumerate(pred_labels.T):  # loop is O(n_neighbors)
                proba_k[all_rows, idx] += weights[:, i]

            # normalize 'votes' into real [0,1] probabilities
            normalizer = proba_k.sum(axis=1)[:, np.newaxis]
            normalizer[normalizer == 0.0] = 1.0
            proba_k /= normalizer

            probabilities.append(proba_k)

        if not self.outputs_2d_:
            probabilities = probabilities[0]

        return probabilities

Simply modify the code so the for k, classes_k in enumerate(classes_): loop is changed to the claculation of one, particular class you need.

One artificial method would be to overwrite the classes_ variable so it is a singleton of considered class, and revert it once you are done.

like image 192
lejlot Avatar answered Nov 28 '25 14:11

lejlot



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!