<pre class="prettyprint lang-py prettyprint-override"><code>from sklearn.feature_extraction.text import TfidfVectorizer import numpy as np from sklearn import linear_model arr=['dogs cats lions','apple pineapple orange','water fire earth air', 'sodium potassium calcium'] vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(arr) feature_names = vectorizer.get_feature_names() Y = ['animals', 'fruits', 'elements','chemicals'] T=["eating apple roasted in fire and enjoying fresh air"] test = vectorizer.transform(T) clf = linear_model.SGDClassifier(loss='log') clf.fit(X,Y) x=clf.predict(test) #prints: elements </code></pre> In the above code, <code>clf.predict()</code> prints only 1 best prediction for a sample from list X. I am interested in top 3 predictions for a particular sample in the list X, i know the function <code>predict_proba</code>/<code>predict_log_proba</code> returns a list of all probabilities for each feature in list Y, but it has to sorted and then associated with the features in list Y before getting the top 3 results. Is there any direct and efficient way?

There is no built-in function, but what is wrong with <pre class="prettyprint"><code>probs = clf.predict_proba(test) best_n = np.argsort(probs, axis=1)[-n:] </code></pre> <h3>?</h3> As suggested by one of the comment, should change <code>[-n:]</code> to <code>[:,-n:]</code> <pre class="prettyprint"><code>probs = clf.predict_proba(test) best_n = np.argsort(probs, axis=1)[:,-n:] </code></pre>

<code>argsort</code> gives results in ascending order, if you want to save yourself with unusual loops or confusion you can use a simple trick. <pre class="prettyprint"><code>probs = clf.predict_proba(test) best_n = np.argsort(-probs, axis=1)[:, :n] </code></pre> Negating the probabilities will turn smallest to largest and hence you can take top-n results in descending order.

How to get Top 3 or Top N predictions using sklearn's SGDClassifier

Tags:

python

scikit-learn

multilabel-classification

from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn import linear_model
arr=['dogs cats lions','apple pineapple orange','water fire earth air', 'sodium potassium calcium']
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(arr)
feature_names = vectorizer.get_feature_names()
Y = ['animals', 'fruits', 'elements','chemicals']
T=["eating apple roasted in fire and enjoying fresh air"]
test = vectorizer.transform(T)
clf = linear_model.SGDClassifier(loss='log')
clf.fit(X,Y)
x=clf.predict(test)
#prints: elements

In the above code, clf.predict() prints only 1 best prediction for a sample from list X. I am interested in top 3 predictions for a particular sample in the list X, i know the function predict_proba/predict_log_proba returns a list of all probabilities for each feature in list Y, but it has to sorted and then associated with the features in list Y before getting the top 3 results. Is there any direct and efficient way?

593

asked Sep 08 '15 15:09

Pranay Mathur

4 Answers

There is no built-in function, but what is wrong with

probs = clf.predict_proba(test)
best_n = np.argsort(probs, axis=1)[-n:]

?

As suggested by one of the comment, should change [-n:] to [:,-n:]

probs = clf.predict_proba(test)
best_n = np.argsort(probs, axis=1)[:,-n:]

110

answered Oct 01 '22 09:10

Andreas Mueller

I know this has been answered...but I can add a bit more...

#both preds and truths are same shape m by n (m is number of predictions and n is number of classes)
def top_n_accuracy(preds, truths, n):
    best_n = np.argsort(preds, axis=1)[:,-n:]
    ts = np.argmax(truths, axis=1)
    successes = 0
    for i in range(ts.shape[0]):
      if ts[i] in best_n[i,:]:
        successes += 1
    return float(successes)/ts.shape[0]

It's quick and dirty but I find it useful. One can add their own error checking, etc..

answered Oct 01 '22 10:10

user1269942

Hopefully, Andreas will help with this. predict_probs is not available when loss='hinge'. To get top n class when loss='hinge' do:

calibrated_clf = CalibratedClassifierCV(clfSDG, cv=3, method='sigmoid')
model = calibrated_clf.fit(train.data, train.label)

probs = model.predict_proba(test_data)
sorted( zip( calibrated_clf.classes_, probs[0] ), key=lambda x:x[1] )[-n:]

Not sure if clfSDG.predict and calibrated_clf.predict will always predict the same class.

answered Oct 01 '22 08:10

valearner

argsort gives results in ascending order, if you want to save yourself with unusual loops or confusion you can use a simple trick.

probs = clf.predict_proba(test)
best_n = np.argsort(-probs, axis=1)[:, :n]

Negating the probabilities will turn smallest to largest and hence you can take top-n results in descending order.

answered Oct 01 '22 09:10

Gaurav Singhal

Related questions
                            
                                python, numpy boolean array: negation in where statement
                            
                                Find phase difference between two (inharmonic) waves
                            
                                Match any unicode letter?
                            
                                Django Boolean Queryset Filter Not Working
                            
                                Does Python optimize function calls from loops?
                            
                                my matplotlib title gets cropped
                            
                                financial python library that has xirr and xnpv function?
                            
                                wxPython WebView example
                            
                                Uncaught ReferenceError: django is not defined
                            
                                How to deal with unicode string in URL in python3?
                            
                                pyodbc.connect timeout argument is ignored for calls to SQL Server
                            
                                Is Brython entirely client-side?
                            
                                Should I force Python type checking?
                            
                                Python: re.compile and re.sub
                            
                                What happens when a function returns its own name in python?
                            
                                Seaborn implot with equation and R2 text
                            
                                Can't connect to S3 buckets with periods in their name, when using Boto on Heroku
                            
                                Matplotlib box plot fliers not showing
                            
                                Collapse multiple submodules to one Cython extension
                            
                                ImportError: No module named cryptography.hazmat.backends - boxsdk on Mac

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With