I would like to know a way of performing ANCOVA(analysis of covariance) using Python with scipy. It is basically a statistical comparison of regression lines. I know Python can do ANOVA and it can also do regression line fitting with Scipy.stats. I'm not sure how to put those together to get an effective ANCOVA though, if it is possible.
ANCOVA can be done with regression an using dummy variables in the design matrix for the effects that depend on the categorical variable.
A simple example is at http://groups.google.com/group/pystatsmodels/browse_thread/thread/aaa31b08f3df1a69?hl=en using the OLS class from scikits.statsmodels
Relevant part of construction of design matrix xg includes group numbers/labels, x1 is continuous explanatory variable
>>> dummy = (xg[:,None] == np.unique(xg)).astype(float)
>>> X = np.c_[x1, dummy[:,1:], np.ones(nsample)]
Estimate the model
>>> res2 = sm.OLS(y, X).fit()
>>> print res2.params
[ 1.00901524 3.08466166 -2.84716135 9.94655423]
>>> print res2.bse
[ 0.07499873 0.71217506 1.16037215 0.38826843]
>>> prstd, iv_l, iv_u = wls_prediction_std(res2)
"Test hypothesis that all groups have same intercept"
>>> R = [[0, 1, 0, 0],
... [0, 0, 1, 0]]
>>> print res2.f_test(R)
<F test: F=array([[ 91.69986847]]), p=[[ 8.90826383e-17]],
df_denom=46, df_num=2>
strongly rejected because differences in intercept are very large
Update (two and a half years later):
scikits.statsmodels
has been renamed to statsmodels
and to the question:
With the latest release of statsmodels, it is more convenient to use formulas for specifying categorical effects and interaction effects. statsmodels
uses patsy to handle the formulas and creates the design matrices.
More information is available at the links to the statsmodels documentation in https://stackoverflow.com/a/19495920/333700
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With