Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Durbin–Watson statistic for one dimensional time series data

I'm experimenting to decide if a time-series (as in, one list of floats) is correlated with itself. I've already had a play with the acf function in statsmodels (http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html), now I'm looking at whether the Durbin–Watson statistic has any worth.

It seems like this kind of thing should work:

from statsmodels.regression.linear_model import OLS
import numpy as np

data = np.arange(100)  # this should be highly correlated
ols_res = OLS(data)
dw_res = np.sum(np.diff(ols_res.resid.values))

If you were to run this, you would get:

Traceback (most recent call last):
...
  File "/usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.py", line 165, in initialize
    self.nobs = float(self.wexog.shape[0])
AttributeError: 'NoneType' object has no attribute 'shape'

It seems that D/W is usually used to compare two time-series (e.g. http://connor-johnson.com/2014/02/18/linear-regression-with-python/) for correlation, so I think the problem is that i've not passed another time-series to compare to. Perhaps this is supposed to be passed in the exog parameter to OLS?

exog : array-like

A nobs x k array where nobs is the number of observations and k is
the number of regressors.

(from http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.html)

Side-note: I'm not sure what a "nobs x k" array means. Maybe an array with is x by k?

So what should I be doing here? Am I expected to pass the data twice, or to lag it manually myself, or?

Thanks!

like image 302
Edd Barrett Avatar asked Apr 10 '17 11:04

Edd Barrett


1 Answers

I've accepted user333700's answer, but I wanted to post a code snippet follow up.

This small program computes the durbin-watson correlation for a linear range (which should be highly correlated, thus giving a value close to 0) and then for random values (which should not be correlated, thus giving a value close to 2):

from statsmodels.regression.linear_model import OLS
import numpy as np
from statsmodels.stats.stattools import durbin_watson



def dw(data):
    ols_res = OLS(data, np.ones(len(data))).fit()
    return durbin_watson(ols_res.resid)


print("dw of range=%f" % dw(np.arange(2000)))
print("dw of rand=%f" % dw(np.random.randn(2000)))

When run:

dw of range=0.000003
dw of rand=2.036162

So I think that looks good :)

like image 52
Edd Barrett Avatar answered Nov 16 '22 04:11

Edd Barrett