I'm using ridge regression (ridgeCV). And I've imported it from: from sklearn.linear_model import LinearRegression, RidgeCV, LarsCV, Ridge, Lasso, LassoCV
How do I extract the p-values? I checked but ridge has no object called summary.
I couldn't find any page which discusses this for python (found one for R).
alphas = np.linspace(.00001, 2, 1)
rr_scaled = RidgeCV(alphas = alphas, cv =5, normalize = True)
rr_scaled.fit(X_train, Y_train)
One way to get the p-value is by using T-test. This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations 'a' is equal to the given population mean, popmean.
This R package provides a simple and efficient method to estimate the p-value of every predictor on a given target variable. The method is based on lasso regression and compares when every predictor enters the active set of the regulatization path against a normally distributed null predictor.
For simple regression, the p-value is determined using a t distribution with n − 2 degrees of freedom (df), which is written as t n − 2 , and is calculated as 2 × area past |t| under a t n − 2 curve. In this example, df = 30 − 2 = 28.
You can use the regressors package to output p values using:
from regressors import stats
stats.coef_pval(rr_scaled, X_train, Y_train)
You can also print out a regression summary (containing std errors, t values, p values, R^2) using:
stats.summary(rr_scaled, X_train, Y_train)
Example:
df = pd.DataFrame({'y':np.random.randn(10), 'x1':np.random.randn(10), 'x2':np.random.randn(10)})
# y x1 x2
# 0 -0.228546 0.133703 0.624039
# 1 -1.005794 1.064283 1.527229
# 2 -2.180160 -1.485611 -0.471199
# 3 -0.683695 -0.213433 -0.692055
# 4 -0.451981 -0.133173 0.995683
# 5 -0.166878 -0.384913 0.255065
# 6 0.816602 -0.380910 0.381321
# 7 -0.408240 1.116328 1.163418
# 8 -0.899570 -1.055483 -0.470597
# 9 0.926600 -1.497506 -0.523385
X_train = df[['x1','x2']]
Y_train = df.y
alphas = np.linspace(.00001, 2, 1)
rr_scaled = RidgeCV(alphas = alphas, cv =5, normalize = True)
rr_scaled.fit(X_train, Y_train)
Calling stats.coef_pval
:
stats.coef_pval(rr_scaled, X_train, Y_train)
# array([0.17324576, 0.77225007, 0.74614808])
Now, calling stats.summary
:
stats.summary(rr_scaled, X_train, Y_train)
# Residuals:
# Min 1Q Median 3Q Max
# -1.3347 -0.2368 0.0038 0.3636 1.7804
# Coefficients:
# Estimate Std. Error t value p value
# _intercept -0.522607 0.353333 -1.4791 0.173246
# x1 -0.143694 0.481720 -0.2983 0.772250
# x2 0.192431 0.576419 0.3338 0.746148
# ---
# R-squared: 0.00822, Adjusted R-squared: -0.27515
# F-statistic: 0.03 on 2 features
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With