I want to compute AIC for linear models to compare their complexity. I did it as follows:
regr = linear_model.LinearRegression()
regr.fit(X, y)
aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1)
def aic(y, y_pred, k):
resid = y - y_pred.ravel()
sse = sum(resid ** 2)
AIC = 2*k - 2*np.log(sse)
return AIC
But I receive a divide by zero encountered in log
error.
AIC = -2(log-likelihood) + 2K Where: K is the number of model parameters (the number of variables in the model plus the intercept). Log-likelihood is a measure of model fit.
The Akaike information criterion (AIC) is a metric that is used to compare the fit of different regression models. It is calculated as: AIC = 2K – 2ln(L)
In regression, AIC is asymptotically optimal for selecting the model with the least mean squared error, under the assumption that the "true model" is not in the candidate set.
The AIC function is 2K – 2(log-likelihood). Lower AIC values indicate a better-fit model, and a model with a delta-AIC (the difference between the two AIC values being compared) of more than -2 is considered significantly better than the model it is being compared to.
sklearn
's LinearRegression
is good for prediction but pretty barebones as you've discovered. (It's often said that sklearn stays away from all things statistical inference.)
statsmodels.regression.linear_model.OLS
has a property attribute AIC
and a number of other pre-canned attributes.
However, note that you'll need to manually add a unit vector to your X
matrix to include an intercept in your model.
from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant
regr = OLS(y, add_constant(X)).fit()
print(regr.aic)
Source is here if you are looking for an alternative way to write manually while still using sklearn
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With