Is there any way to calculate residual deviance of a scikit-learn logistic regression model? This is a standard output from R model summaries, but I couldn't find it any of sklearn's documentation.
Instead of sum of squares, logistic regression uses deviance: DEV(μ|Y)=−2logL(μ|Y)+2logL(Y|Y) where μ is a location estimator for Y.
8.7 Assessing logistic model fit In logistic regression, as with linear regression, the residuals can be defined as observed minus expected values. The data are discrete and so are the residuals. As a result, plots of raw residuals from logistic regression are generally not useful.
Deviance residualsD(y,ˆμ)=2(log(p(y∣ˆθs))−log(p(y∣ˆθ0))). ˆθs and ˆθ0 are the parameters of the fitted saturated and proposed models, respectively. A saturated model has as many parameters as it has training points, that is, p=n.
The residual deviance shows how well the response is predicted by the model when the predictors are included. From your example, it can be seen that the deviance goes up by 3443.3 when 22 predictor variables are added (note: degrees of freedom = no. of observations – no. of predictors) .
model.predict_proba
normalize=False
in function metrics.log_loss()
to return the sum of the per-sample losses.So to complete @ingo's answer, to obtain the model deviance with sklearn.linear_model.LogisticRegression
, you can compute:
def deviance(X, y, model):
return 2*metrics.log_loss(y, model.predict_proba(X), normalize=False)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With