I would like to predict the probability from Logistic Regression model with cross-validation. I know you can get the cross-validation scores, but is it possible to return the values from predict_proba instead of the scores?
# imports from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import (StratifiedKFold, cross_val_score, train_test_split) from sklearn import datasets # setup data iris = datasets.load_iris() X = iris.data y = iris.target # setup model cv = StratifiedKFold(y, 10) logreg = LogisticRegression() # cross-validation scores scores = cross_val_score(logreg, X, y, cv=cv) # predict probabilities Xtrain, Xtest, ytrain, ytest = train_test_split(X, y) logreg.fit(Xtrain, ytrain) proba = logreg.predict_proba(Xtest)
The predict_proba() returns the number of votes for each class, divided by the number of trees in the forest. Your precision is exactly 1/n_estimators. If you want to see variation at the 5th digit, you will need 10**5 = 100,000 estimators, which is excessive.
model. predict_proba() : For classification problems, some estimators also provide this method, which returns the probability that a new observation has each categorical label. In this case, the label with the highest probability is returned by model.
The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).
Normal cross validation compares un-aggregated predictions to the ground truth, so it doesn't evaluate possible stabilization by aggregating. Thus, for an un-aggregated model, an un-aggregated (i.e. the usual) cross validation can be used as approximation for predictive performance/generalization error estimate.
This is now implemented as part of scikit-learn version 0.18. You can pass a 'method' string parameter to the cross_val_predict method. Documentation is here.
Example:
proba = cross_val_predict(logreg, X, y, cv=cv, method='predict_proba')
Also note that this is part of the new sklearn.model_selection package so you will need this import:
from sklearn.model_selection import cross_val_predict
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With