I'm experimenting to decide if a time-series (as in, one list of floats) is correlated with itself. I've already had a play with the acf
function in statsmodels (http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html), now I'm looking at whether the Durbin–Watson statistic has any worth.
It seems like this kind of thing should work:
from statsmodels.regression.linear_model import OLS
import numpy as np
data = np.arange(100) # this should be highly correlated
ols_res = OLS(data)
dw_res = np.sum(np.diff(ols_res.resid.values))
If you were to run this, you would get:
Traceback (most recent call last):
...
File "/usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.py", line 165, in initialize
self.nobs = float(self.wexog.shape[0])
AttributeError: 'NoneType' object has no attribute 'shape'
It seems that D/W is usually used to compare two time-series (e.g. http://connor-johnson.com/2014/02/18/linear-regression-with-python/) for correlation, so I think the problem is that i've not passed another time-series to compare to. Perhaps this is supposed to be passed in the exog
parameter to OLS
?
exog : array-like
A nobs x k array where nobs is the number of observations and k is
the number of regressors.
(from http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.html)
Side-note: I'm not sure what a "nobs x k" array means. Maybe an array with is x
by k
?
So what should I be doing here? Am I expected to pass the data
twice,
or to lag it manually myself, or?
Thanks!
I've accepted user333700's answer, but I wanted to post a code snippet follow up.
This small program computes the durbin-watson correlation for a linear range (which should be highly correlated, thus giving a value close to 0) and then for random values (which should not be correlated, thus giving a value close to 2):
from statsmodels.regression.linear_model import OLS
import numpy as np
from statsmodels.stats.stattools import durbin_watson
def dw(data):
ols_res = OLS(data, np.ones(len(data))).fit()
return durbin_watson(ols_res.resid)
print("dw of range=%f" % dw(np.arange(2000)))
print("dw of rand=%f" % dw(np.random.randn(2000)))
When run:
dw of range=0.000003
dw of rand=2.036162
So I think that looks good :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With