I'm trying to understand what f_regression() in the feature selection package does. (http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html#sklearn.feature_selection.f_regression)
According to the documentation, the first step in f_regression is as follows:
"1. the regressor of interest and the data are orthogonalized wrt constant regressors."
What does this line mean, exactly? What are these constant regressors?
Thanks!
f_regression is therefore recommended as a feature selection criterion to identify potentially predictive feature for a downstream classifier, irrespective of the sign of the association with the target variable. Furthermore f_regression returns p-values while r_regression does not.
The coef_ contain the coefficients for the prediction of each of the targets. It is also the same as if you trained a model to predict each of the targets separately.
If you want to extract a summary of a regression model in Python, you should use the statsmodels package. The code below demonstrates how to use this package to fit the same multiple linear regression model as in the earlier example and obtain the model summary. To access and download the CSV file click here.
It means that the mean is subtracted on both variables.
A constant regressor is a vector full of ones. What this vector can explain in your data is then subtracted out. This leads to a vector with zero sum, i.e. a centered variable.
What f1_regression
essentially calculates is correlation, a scalar product between centered and appropriately rescaled variables.
The resulting score is a function of this value and the degrees of freedom, i.e. the dimensionality of the vectors. The higher the score, the more probably the variables are associated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With