Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to align two unequal sized timeseries numpy array?

I have two numpy arrays containing timeseries (unix timestamps).
I want to find pairs of timestamps (1 from each array) whose difference is within a threshold.

For achieving this, I need to align two of the time series data into two arrays, such that each index has its closest pair. (In case of two timestamps in arrays equally close to another timestamp in another array, I don't mind choosing either one, as the count of pairs is more important than the actual values.)

So the aligned data set will have two arrays of same size, plus a smaller array being filled with empty data .

I was thinking of using timeseries package and the align function.
But am not sure how to use aligned for my data which is a timeseries.

Example consider two timeseries arrays:

ts1=np.array([ 1311242821.0, 1311242882.0, 1311244025.0, 1311244145.0, 1311251330.0, 
               1311282555.0, 1311282614.0])
ts2=np.array([ 1311226761.0, 1311227001.0, 1311257033.0, 1311257094.0, 1311281265.0])

Output sample:

Now for ts2[2] (1311257033.0), its closest pair should be ts1[4] (1311251330.0) because the difference is 5703.0, which is within the threshold, and it is the smallest. Now that ts2[2] and ts1[4] are already paired they should be left out of other calculations.

Such Pairs should be found, so the Output array might be longer than the actual arrays

abs(ts1[0]-ts2[0]) = 16060
abs(ts1[0]-ts2[1]) = 15820 //pair
abs(ts1[0]-ts2[2]) = 14212
abs(ts1[0]-ts2[3]) = 14273
abs(ts1[0]-ts2[4]) = 38444


abs(ts1[1]-ts2[0]) = 16121
abs(ts1[1]-ts2[1]) = 15881
abs(ts1[1]-ts2[2]) = 14151
abs(ts1[1]-ts2[3]) = 14212
abs(ts1[1]-ts2[4]) = 38383


abs(ts1[2]-ts2[0]) = 17264
abs(ts1[2]-ts2[1]) = 17024
abs(ts1[2]-ts2[2]) = 13008
abs(ts1[2]-ts2[3]) = 13069
abs(ts1[2]-ts2[4]) = 37240


abs(ts1[3]-ts2[0]) = 17384
abs(ts1[3]-ts2[1]) = 17144
abs(ts1[3]-ts2[2]) = 12888
abs(ts1[3]-ts2[3]) = 17144
abs(ts1[3]-ts2[4]) = 37120


abs(ts1[4]-ts2[0]) = 24569
abs(ts1[4]-ts2[1]) = 24329
abs(ts1[4]-ts2[2]) = 5703 //pair
abs(ts1[4]-ts2[3]) = 5764
abs(ts1[4]-ts2[4]) = 29935


abs(ts1[5]-ts2[0]) = 55794
abs(ts1[5]-ts2[1]) = 55554
abs(ts1[5]-ts2[2]) = 25522
abs(ts1[5]-ts2[3]) = 25461
abs(ts1[5]-ts2[4]) = 1290 //pair


abs(ts1[6]-ts2[0]) = 55853
abs(ts1[6]-ts2[1]) = 55613
abs(ts1[6]-ts2[2]) = 25581
abs(ts1[6]-ts2[3]) = 25520
abs(ts1[6]-ts2[4]) = 1349


So the pairs are: (ts1[0],ts2[1]), (ts1[4],ts2[2]), (ts1[5],ts2[4])
The rest of elements should have null as their pair
The final two arrays will be of size 9.

Please let me know if this question is clear.

like image 485
Dexters Avatar asked May 28 '12 17:05

Dexters


2 Answers

Solution using numpy Mask arrays output aligned Timeseries(_ts1, _ts2).
The Result are 3 Pairs and only Pairs with Distance 1 can be used to align the Timeseries therfore Threshold=1.

def compute_diffs(threshold):
    dtype = [('diff', int), ('ts1', int), ('ts2', int), ('threshold', int)]
    diffs = np.empty((ts1.shape[0], ts2.shape[0]), dtype=dtype)
    pairs = np.ma.make_mask_none(diffs.shape)

    for i1, t1 in enumerate(ts1):
        for i2, t2 in enumerate(ts2):
            diffs[i1, i2] = (abs(t1 - t2), i1, i2, abs(i1-i2))

        d1 = diffs[i1][diffs[i1]['threshold'] == threshold]
        if d1.size == 1:
            (diff, y, x, t) = d1[0]
            pairs[y, x] = True
    return diffs, pairs

def align_timeseries(diffs):
    def _sync(ts, ts1, ts2, i1, i2, ii):
        while i1 < i2:
            ts1[ii] = ts[i1]; i1 +=1
            ts2[ii] = DTNULL
            ii += 1
        return ii, i1

    _ts1 = np.array([DTNULL]*9)
    _ts2 = np.copy(_ts1)
    ii = _i1 = _i2 = 0

    for n, (diff, i1, i2, t) in enumerate(np.sort(diffs, order='ts1')):
        ii, _i1 = _sync(ts1, _ts1, _ts2, _i1, i1, ii)
        ii, _i2 = _sync(ts2, _ts2, _ts1, _i2, i2, ii)

        if _i1 == i1:
            _ts1[ii] = ts1[i1]; _i1 += 1
            _ts2[ii] = ts2[i2]; _i2 += 1
            ii += 1

    ii, _i1 = _sync(ts1, _ts1, _ts2, _i1, ts1.size, ii)
    return _ts1, _ts2

main:

diffs, pairs = compute_diffs(threshold=1)
print('diffs[pairs]:{}'.format(diffs[pairs]))
_ts1, _ts2 = align_timeseries(diffs[pairs])
pprint(ts1, ts2, _ts1, _ts2)

Output:

diffs[pairs]:[(15820, 0, 1) ( 5703, 4, 2) ( 1290, 5, 4)]
           ts1                  ts2                    _ts1          diff          _ts2
0: 2011-07-21 12:07:01  2011-07-21 07:39:21     ---- -- -- -- -- --  ----   2011-07-21 07:39:21
1: 2011-07-21 12:08:02  2011-07-21 07:43:21     2011-07-21 12:07:01 15820   2011-07-21 07:43:21
2: 2011-07-21 12:27:05  2011-07-21 16:03:53     2011-07-21 12:08:02  ----   ---- -- -- -- -- --
3: 2011-07-21 12:29:05  2011-07-21 16:04:54     2011-07-21 12:27:05  ----   ---- -- -- -- -- --
4: 2011-07-21 14:28:50  2011-07-21 22:47:45     2011-07-21 12:29:05  ----   ---- -- -- -- -- --
5: 2011-07-21 23:09:15  ---- -- -- -- -- --     2011-07-21 14:28:50  5703   2011-07-21 16:03:53
6: 2011-07-21 23:10:14  ---- -- -- -- -- --     ---- -- -- -- -- --  ----   2011-07-21 16:04:54
7: ---- -- -- -- -- --  ---- -- -- -- -- --     2011-07-21 23:09:15  1290   2011-07-21 22:47:45
8: ---- -- -- -- -- --  ---- -- -- -- -- --     2011-07-21 23:10:14  ----   ---- -- -- -- -- --

Tested with Python: 3.4.2

like image 122
stovfl Avatar answered Sep 17 '22 13:09

stovfl


I don't know what you mean with aligning timestamps. But you can use the time module to represent timestamps as floats or integers. In a first step you can convert any userformat to an array defined by time.struct_time. In a second step you can convert this to seconds form start of the epoche. Then you have integervalues to do calculations with the timestamps.

How to convert user format using time.strptime() is well explained in the docs:

    >>> import time
    >>> t = time.strptime("30 Nov 00", "%d %b %y")
    >>> t
    time.struct_time(tm_year=2000, tm_mon=11, tm_mday=30, tm_hour=0, tm_min=0,
             tm_sec=0, tm_wday=3, tm_yday=335, tm_isdst=-1)
    >>> time.mktime(t)
    975538800.0
like image 33
Schuh Avatar answered Sep 18 '22 13:09

Schuh