I want to get the average difference between the overlapping parts of two time series. However, both their ranges and their intervals are different. What is the best way to solve these two problems?
Example data:
Series 1:
2014-08-05 05:03:00 25.194
2014-08-05 05:08:00 25.196
2014-08-05 05:13:00 25.197
2014-08-05 05:18:00 25.199
2014-08-05 05:23:00 25.192
Series 2:
2014-08-05 05:12:00 25.000000
2014-08-05 05:13:00 25.000000
2014-08-05 05:14:00 25.000000
Is this what you are looking for?
First you can align both serieses (so they both have the same indices. It is also possible to only reindex one of both to the index of the other with reindex
):
In [85]: s1, s2 = s1.align(s2)
In [86]: s1
Out[86]:
2014-08-05 05:03:00 25.194
2014-08-05 05:08:00 25.196
2014-08-05 05:12:00 NaN
2014-08-05 05:13:00 25.197
2014-08-05 05:14:00 NaN
2014-08-05 05:18:00 25.199
2014-08-05 05:23:00 25.192
dtype: float64
In [87]: s2
Out[87]:
2014-08-05 05:03:00 NaN
2014-08-05 05:08:00 NaN
2014-08-05 05:12:00 25
2014-08-05 05:13:00 25
2014-08-05 05:14:00 25
2014-08-05 05:18:00 NaN
2014-08-05 05:23:00 NaN
dtype: float64
Then you can interpolate the missing values (eg with linear interpolation based on the time index):
In [88]: s1.interpolate(method='time')
Out[88]:
2014-08-05 05:03:00 25.1940
2014-08-05 05:08:00 25.1960
2014-08-05 05:12:00 25.1968
2014-08-05 05:13:00 25.1970
2014-08-05 05:14:00 25.1974
2014-08-05 05:18:00 25.1990
2014-08-05 05:23:00 25.1920
dtype: float64
And then just substract both serieses to get the difference:
In [91]: s = s1.interpolate(method='time') - s2.interpolate(method='time')
In [92]: s
Out[92]:
2014-08-05 05:03:00 NaN
2014-08-05 05:08:00 NaN
2014-08-05 05:12:00 0.1968
2014-08-05 05:13:00 0.1970
2014-08-05 05:14:00 0.1974
2014-08-05 05:18:00 0.1990
2014-08-05 05:23:00 0.1920
dtype: float64
In [93]: s.mean()
Out[93]: 0.19643999999999906
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With