The OLSResults of
df2 = pd.read_csv("MultipleRegression.csv") X = df2[['Distance', 'CarrierNum', 'Day', 'DayOfBooking']] Y = df2['Price'] X = add_constant(X) fit = sm.OLS(Y, X).fit() print(fit.summary())
shows the P values of each attribute to only 3 decimal places.
I need to extract the p value for each attribute like Distance
, CarrierNum
etc. and print it in scientific notation.
I can extract the coefficients using fit.params[0]
or fit.params[1]
etc.
Need to get it for all their P values.
Also what does all P values being 0 mean?
One way to get the p-value is by using T-test. This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations 'a' is equal to the given population mean, popmean.
The P-value is a statistical number to conclude if there is a relationship between Average_Pulse and Calorie_Burnage. We test if the true value of the coefficient is equal to zero (no relationship). The statistical test for this is called Hypothesis testing.
For simple regression, the p-value is determined using a t distribution with n − 2 degrees of freedom (df), which is written as t n − 2 , and is calculated as 2 × area past |t| under a t n − 2 curve. In this example, df = 30 − 2 = 28.
So, finding the p-value for each coefficient will tell if the variable is statistically significant to predict the target. As a general rule of thumb, if the p-value is less than 0.05: there is a strong relationship between the variable and the target.
You need to do fit.pvalues[i]
to get the answer where i
is the index of independent variables. i.e. fit.pvalues[0]
for intercept, fit.pvalues[1]
for Distance
, etc.
You can also look for all the attributes of an object using dir(<object>)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With