Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas timeseries comparison

I want to get the average difference between the overlapping parts of two time series. However, both their ranges and their intervals are different. What is the best way to solve these two problems?

Example data:

Series 1:
2014-08-05 05:03:00    25.194      
2014-08-05 05:08:00    25.196      
2014-08-05 05:13:00    25.197      
2014-08-05 05:18:00    25.199      
2014-08-05 05:23:00    25.192      

Series 2:
2014-08-05 05:12:00    25.000000
2014-08-05 05:13:00    25.000000
2014-08-05 05:14:00    25.000000
like image 683
mikael Avatar asked Aug 15 '14 07:08

mikael


1 Answers

Is this what you are looking for?

First you can align both serieses (so they both have the same indices. It is also possible to only reindex one of both to the index of the other with reindex):

In [85]: s1, s2 = s1.align(s2)

In [86]: s1
Out[86]: 
2014-08-05 05:03:00    25.194
2014-08-05 05:08:00    25.196
2014-08-05 05:12:00       NaN
2014-08-05 05:13:00    25.197
2014-08-05 05:14:00       NaN
2014-08-05 05:18:00    25.199
2014-08-05 05:23:00    25.192
dtype: float64

In [87]: s2
Out[87]: 
2014-08-05 05:03:00   NaN
2014-08-05 05:08:00   NaN
2014-08-05 05:12:00    25
2014-08-05 05:13:00    25
2014-08-05 05:14:00    25
2014-08-05 05:18:00   NaN
2014-08-05 05:23:00   NaN
dtype: float64

Then you can interpolate the missing values (eg with linear interpolation based on the time index):

In [88]: s1.interpolate(method='time')
Out[88]: 
2014-08-05 05:03:00    25.1940
2014-08-05 05:08:00    25.1960
2014-08-05 05:12:00    25.1968
2014-08-05 05:13:00    25.1970
2014-08-05 05:14:00    25.1974
2014-08-05 05:18:00    25.1990
2014-08-05 05:23:00    25.1920
dtype: float64

And then just substract both serieses to get the difference:

In [91]: s = s1.interpolate(method='time') - s2.interpolate(method='time')

In [92]: s
Out[92]: 
2014-08-05 05:03:00       NaN
2014-08-05 05:08:00       NaN
2014-08-05 05:12:00    0.1968
2014-08-05 05:13:00    0.1970
2014-08-05 05:14:00    0.1974
2014-08-05 05:18:00    0.1990
2014-08-05 05:23:00    0.1920
dtype: float64

In [93]: s.mean()
Out[93]: 0.19643999999999906
like image 108
joris Avatar answered Sep 29 '22 03:09

joris