Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to aggregate timeseries in Python?

I have two different timeseries with partially overlapping timestamps:

import scikits.timeseries as ts
from datetime import datetime 
a = ts.time_series([1,2,3], dates=[datetime(2010,10,20), datetime(2010,10,21), datetime(2010,10,23)], freq='D')
b = ts.time_series([4,5,6], dates=[datetime(2010,10,20), datetime(2010,10,22), datetime(2010,10,23)], freq='D')

which represents following data:

Day:   20. 21. 22. 23.
  a:    1   2   -   3
  b:    4   -   5   6

I would like to calculate a weighted average on every day with coefficients a(0.3) and b(0.7), while ignoring missing values:

Day 20.: (0.3 * 1 + 0.7 * 4) / (0.3 + 0.7) = 3.1 / 1.  = 3.1
Day 21.: (0.3 * 2          ) / (0.3      ) = 0.6 / 0.3 = 2
Day 22.: (          0.7 * 5) / (      0.7) = 3.5 / 0.7 = 5
Day 23.: (0.3 * 3 + 0.7 * 6) / (0.3 + 0.7) = 3.1 / 1.  = 5.1

when I first try to align these timeseries:

a1, b1 = ts.aligned(a, b)

I get correctly masked timeseries:

timeseries([1 2 -- 3],
  dates = [20-Oct-2010 ... 23-Oct-2010],
  freq  = D)

timeseries([4 -- 5 6],
  dates = [20-Oct-2010 ... 23-Oct-2010],
  freq  = D)

but when I do a1 * 0.3 + b1 * 0.7, it ignores values, that are present in one timeseries only:

timeseries([3.1 -- -- 5.1],
   dates = [20-Oct-2010 ... 23-Oct-2010],
   freq  = D)

What should I do to receive the awaited?

timeseries([3.1 2. 5. 5.1],
   dates = [20-Oct-2010 ... 23-Oct-2010],
   freq  = D)

EDIT: The answer should be applicable also to more than two initial timeseries with different weights and differently missing values.

So if we have four timeseries with weights T1(0.1), T2(0.2), T3(0.3) and T4(0.4), their weights at a given timestamp will be:

            |  T1 |  T2 |  T3 |  T4 |
weight      | 0.1 | 0.2 | 0.3 | 0.4 |
-------------------------------------
all present | 10% | 20% | 30% | 40% |
T1 missing  |     | 22% | 33% | 45% |
T1,T2 miss. |     |     | 43% | 57% |
T4 missing  | 17% | 33% | 50% |     |
etc.
like image 430
eumiro Avatar asked Oct 20 '10 12:10

eumiro


1 Answers

I have tried and found this:

aWgt = 0.3
bWgt = 0.7

print (np.where(a1.mask, 0., a1.data * aWgt) +
       np.where(b1.mask, 0., b1.data * bWgt)) / (np.where(a1.mask, 0., aWgt) +
                                                 np.where(b1.mask, 0., bWgt))

# array([ 3.1,  2. ,  5. ,  5.1])

This is applicable to the edited question with more than one initial timeseries. But hopefully someone will find better.

EDIT: And this is my function:

def weightedAvg(weightedTimeseries):
    sumA = np.sum((np.where(ts.mask, 0., ts.data * weight) for ts, weight in weightedTimeseries), axis=0)
    sumB = np.sum((np.where(ts.mask, 0., weight) for ts, weight in weightedTimeseries), axis=0)
    return np.divide(sumA, sumB)

weightedAvg(((a1, 0.3), (bb, 0.7)))
# array([ 3.1,  2. ,  5. ,  5.1])

Works for any number of timeseries ;-)

like image 134
eumiro Avatar answered Sep 24 '22 15:09

eumiro