I am running OLS regression using pandas.stats.api.ols
using a groupby
with the following code:
from pandas.stats.api import ols
df=pd.read_csv(r'F:\file.csv')
result=df.groupby(['FID']).apply(lambda d: ols(y=d.loc[:, 'MEAN'], x=d.loc[:, ['Accum_Prcp', 'Accum_HDD']]))
for i in result:
x=pd.DataFrame({'FID':i.index, 'delete':i.values})
frame = pd.concat([x,DataFrame(x['delete'].tolist())], axis=1, join='outer')
del frame['delete']
print frame
but this returns the error:
AttributeError: 'OLS' object has no attribute 'index'
I have about 2,000 items in my group by and when I print each one out they look something like this:
-
------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <Accum_Prcp> + <Accum_HDD> + <intercept>
Number of Observations: 79
Number of Degrees of Freedom: 3
R-squared: 0.1242
Adj R-squared: 0.1012
Rmse: 0.1929
F-stat (2, 76): 5.3890, p-value: 0.0065
Degrees of Freedom: model 2, resid 76
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
Accum_Prcp 0.0009 0.0003 3.28 0.0016 0.0004 0.0015
Accum_HDD 0.0000 0.0000 1.98 0.0516 0.0000 0.0000
intercept 0.4750 0.0811 5.86 0.0000 0.3161 0.6340
---------------------------------End of Summary---------------------------------
I want to be able to export each one to a csv so that I can view them individually.
save. image(file="mysession. RData") will save all of the objects in your current workspace to a file (which can be read back into R via load("mysession. RData") ).
OLS is a common technique used in analyzing linear regression. In brief, it compares the difference between individual points in your data set and the predicted best fit line to measure the amount of error produced.
Ordinary least squares (OLS) regression is a method that allows us to find a line that best describes the relationship between one or more predictor variables and a response variable. This method allows us to find the following equation: ŷ = b0 + b1x. where: ŷ: The estimated response value.
The OLS() function of the statsmodels. api module is used to perform OLS regression. It returns an OLS object. Then fit() method is called on this object for fitting the regression line to the data.
As of statsmodels 0.9
, the Summary
class supports export to multiple formats, including CSV and text:
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
dat = sm.datasets.get_rdataset("Guerry", "HistData").data
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
with open('summary.txt', 'w') as fh:
fh.write(results.summary().as_text())
with open('summary.csv', 'w') as fh:
fh.write(results.summary().as_csv())
The output of as_csv()
is not machine-readable. Dumping results
parameters with repr()
would be.
In order to write out the result
of pandas.stats.api.ols
, use a text file to match the output format, for instance:
from pandas.stats.api import ols
grps = df.groupby(['FID'])
for fid, grp in grps:
result = ols(y=grp.loc[:, 'MEAN'], x=grp.loc[:, ['Accum_Prcp', 'Accum_HDD']])
text_file = open("Output {}.txt".format(fid), "w")
text_file.write(result.summary)
text_file.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With