Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

predict_proba for a cross-validated model

I would like to predict the probability from Logistic Regression model with cross-validation. I know you can get the cross-validation scores, but is it possible to return the values from predict_proba instead of the scores?

# imports from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import (StratifiedKFold, cross_val_score,                                       train_test_split) from sklearn import datasets  # setup data iris = datasets.load_iris() X = iris.data y = iris.target  # setup model cv = StratifiedKFold(y, 10) logreg = LogisticRegression()  # cross-validation scores scores = cross_val_score(logreg, X, y, cv=cv)  # predict probabilities Xtrain, Xtest, ytrain, ytest = train_test_split(X, y) logreg.fit(Xtrain, ytrain) proba = logreg.predict_proba(Xtest) 
like image 504
Mads Jensen Avatar asked Feb 28 '15 22:02

Mads Jensen


People also ask

How is predict_proba calculated?

The predict_proba() returns the number of votes for each class, divided by the number of trees in the forest. Your precision is exactly 1/n_estimators. If you want to see variation at the 5th digit, you will need 10**5 = 100,000 estimators, which is excessive.

What does model predict_proba () do in Sklearn?

model. predict_proba() : For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.

What is the difference between predict_proba and predict?

The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).

How do you use cross validation model for prediction?

Normal cross validation compares un-aggregated predictions to the ground truth, so it doesn't evaluate possible stabilization by aggregating. Thus, for an un-aggregated model, an un-aggregated (i.e. the usual) cross validation can be used as approximation for predictive performance/generalization error estimate.


1 Answers

This is now implemented as part of scikit-learn version 0.18. You can pass a 'method' string parameter to the cross_val_predict method. Documentation is here.

Example:

proba = cross_val_predict(logreg, X, y, cv=cv, method='predict_proba') 

Also note that this is part of the new sklearn.model_selection package so you will need this import:

from sklearn.model_selection import cross_val_predict 
like image 69
ronathan Avatar answered Sep 21 '22 23:09

ronathan