Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correlation of Two Variables in a Time Series in Python?

If I have two different data sets that are in a time series, is there a simple way to find the correlation between the two sets in python?

For example with:

# [ (dateTimeObject, y, z) ... ]
x = [ (8:00am, 12, 8), (8:10am, 15, 10) .... ]

How might I get the correlation of y and z in Python?

like image 366
Kyle Brandt Avatar asked Jan 26 '11 20:01

Kyle Brandt


4 Answers

Little slow on the uptake here. pandas (http://github.com/wesm/pandas and pandas.sourceforge.net) is probably your best bet. I'm biased because I wrote it but:

In [7]: ts1
Out[7]: 
2000-01-03 00:00:00    -0.945653010936
2000-01-04 00:00:00    0.759529904445
2000-01-05 00:00:00    0.177646448683
2000-01-06 00:00:00    0.579750822716
2000-01-07 00:00:00    -0.0752734982291
2000-01-10 00:00:00    0.138730447557
2000-01-11 00:00:00    -0.506961851495

In [8]: ts2
Out[8]: 
2000-01-03 00:00:00    1.10436688823
2000-01-04 00:00:00    0.110075215713
2000-01-05 00:00:00    -0.372818939799
2000-01-06 00:00:00    -0.520443811368
2000-01-07 00:00:00    -0.455928700936
2000-01-10 00:00:00    1.49624355051
2000-01-11 00:00:00    -0.204383054598

In [9]: ts1.corr(ts2)
Out[9]: -0.34768587480980645

Notably if your data are over different sets of dates, it will compute the pairwise correlation. It will also automatically exclude NaN values!

like image 165
Wes McKinney Avatar answered Nov 18 '22 01:11

Wes McKinney


Scipy has a statistics module with correlation function.

from scipy import stats
# Y and Z are numpy arrays or lists of variables 
stats.pearsonr(Y, Z)
like image 36
kefeizhou Avatar answered Nov 18 '22 01:11

kefeizhou


You can do that via the covariance matrix or correlation coefficients. http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html and http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html are the documentation functions for this, the former also comes with a sample how to use it (corrcoef usage is very similar).

>>> x = [ (None, 12, 8), (None, 15, 10), (None, 10, 6) ]
>>> data = numpy.array([[e[1] for e in x], [e[2] for e in x]])
>>> numpy.corrcoef(data)
array([[ 1.        ,  0.99339927],
       [ 0.99339927,  1.        ]])
like image 4
etarion Avatar answered Nov 18 '22 00:11

etarion


Use numpy:

from numpy import *
v = [ ('k', 1, 2), ('l', 2, 4), ('m', 13, 9) ]
corrcoef([ a[1] for a in v ], [ a[2] for a in v ])[0,1]
like image 1
jimmyb Avatar answered Nov 18 '22 01:11

jimmyb