Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the regression intercept using Statsmodels.api

Tags:

I am trying calculate a regression output using python library but I am unable to get the intercept value when I use the library:

import statsmodels.api as sm

It prints all the regression analysis except the intercept.

but when I use:

from pandas.stats.api import ols

My code for pandas:

Regression = ols(y= Sorted_Data3['net_realization_rate'],x = Sorted_Data3[['Cohort_2','Cohort_3']])
print Regression  

I get the the intercept with a warning that this librabry will be deprecated in the future so I am trying to use Statsmodels.

the warning that I get while using pandas.stats.api:

Warning (from warnings module): File "C:\Python27\lib\idlelib\run.py", line 325 exec code in self.locals FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html

My code for Statsmodels:

import pandas as pd
import numpy as np
from pandas.stats.api import ols
import statsmodels.api as sm

Data1 = pd.read_csv('C:\Shank\Regression.csv')  #Importing CSV
print Data1

running some cleaning code

sm_model = sm.OLS(Sorted_Data3['net_realization_rate'],Sorted_Data3[['Cohort_2','Cohort_3']])
results = sm_model.fit()
print '\n'
print results.summary()

I even tried statsmodels.formula.api: as:

sm_model = sm.OLS(formula ="net_realization_rate ~ Cohort_2 + Cohort_3", data = Sorted_Data3)
results = sm_model.fit()
print '\n'
print result.params
print '\n'
print results.summary()

but I get the error:

TypeError: init() takes at least 2 arguments (1 given)

Final output: 1st is from pandas 2nd is from Stats.... I want the intercept vaule as the one from pandas from stats also: enter image description here

like image 303
Shank Avatar asked Aug 08 '16 18:08

Shank


People also ask

What is the intercept in regression?

The intercept (often labeled as constant) is the point where the function crosses the y-axis. In some analysis, the regression model only becomes significant when we remove the intercept, and the regression line reduces to Y = bX + error.

What is statsmodels formula API?

statsmodels. formula. api : A convenience interface for specifying models using formula strings and DataFrames. This API directly exposes the from_formula class method of models that support the formula API.

What is statsmodels linear regression?

Linear regression statsmodel is the model that helps us to predict and is used for fitting up the scenario where one parameter is directly dependent on the other parameter. Here, we have one variable that is dependent and the other one which is independent.


2 Answers

So, statsmodels has a add_constant method that you need to use to explicitly add intercept values. IMHO, this is better than the R alternative where the intercept is added by default.

In your case, you need to do this:

import statsmodels.api as sm
endog = Sorted_Data3['net_realization_rate']
exog = sm.add_constant(Sorted_Data3[['Cohort_2','Cohort_3']])

# Fit and summarize OLS model
mod = sm.OLS(endog, exog)
results = mod.fit()
print results.summary()

Note that you can add a constant before your array, or after it by passing True (default) or False to the prepend kwag in sm.add_constant


Or, not recommended, but you can use Numpy to explicitly add a constant column like so:

exog = np.concatenate((np.repeat(1, len(Sorted_Data3))[:, None], 
                       Sorted_Data3[['Cohort_2','Cohort_3']].values),
                       axis = 1)
like image 124
Kartik Avatar answered Sep 16 '22 14:09

Kartik


You can also do something like this:

df['intercept'] = 1

Here you are explicitly creating a column for the intercept.

Then you can just use the sm.OLS method like so:

lm = sm.OLS(df['y_column'], df[['intercept', 'x_column']])
results = lm.fit()
results.summary()
like image 25
Cody Mitchell Avatar answered Sep 19 '22 14:09

Cody Mitchell