When I run a logistic regression by sm.Logit (in the statsmodel library), part of the result is like this:
Pseudo R-squ.: 0.4335
Log-Likelihood: -291.08
LL-Null: -513.87
LLR p-value: 2.978e-96
How could I explain the significance of the model? Or say, the ability of explaining? Which indicator should I use? I have searched online and there isn't much information about Pseudo R2 and LLR pvalue. I'm confused that how I can say that my model is good.
Statsmodels provides a Logit() function for performing logistic regression. The Logit() function accepts y and X as parameters and returns the Logit object. The model is then fitted to the data.
LL-based pseudo-R2 measures draw comparisons between the LL of the estimated model and the LL of the null model. The null model contains no parameters but the intercept. Pseudo-R2s can then be interpreted as a measure of improvement over the null model in terms of LL and thus give an indication of goodness of fit.
From Hands-On Machine Learning for Algorithmic Trading:
Log-Likelihood
: this is the maximized value of the log-likelihood function.LL-Null
: this is the result of the maximized log-likelihood function when only an intercept is included. It forms the basis for the pseudo- statistic and the Log-Likelihood Ratio (LRR) test (see below)pseudo
-: this is a substitute of the familiar available under least squares. It is computed based on the ratio of the maximized log-likelihood function for the null modelm0
and the full modelm1
as follows:
(source: googleapis.com)
The values vary from 0 (when the model does not improve the likelihood) to 1 (where the model fits perfectly and the log-likelihood is maximized at 0). Consquently, higher values indicate a better fit.
LLR
: The LLR test generally compares a more restricted model and is computed as:
The null hypothesis is that the restricted model performs better but a low p-value suggests that we can reject this hypothesis and prefer the full model over the null model. This is similar to the F-test for linear regression (where can also use the LLR test when we estimate the model using MLE).
z-statistic
: plays the same role as the t-statistic in the linear regression output and is equally computed as the ratio of the coefficient estimate and its standard error.
p-values
: these indicate the probability of observing the test statistic assuming the null hypothesis that the population coefficient is zero.
As you can see (and the way I understand it), many of these metrics are counterparts to those of the linear regression case. Furthermore, as Rose already point out, I would recommend checking the statsmodel documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With