I have two numpy arrays containing timeseries (unix timestamps).
I want to find pairs of timestamps (1 from each array) whose difference is within a threshold.
For achieving this, I need to align two of the time series data into two arrays, such that each index has its closest pair. (In case of two timestamps in arrays equally close to another timestamp in another array, I don't mind choosing either one, as the count of pairs is more important than the actual values.)
So the aligned data set will have two arrays of same size, plus a smaller array being filled with empty data .
I was thinking of using timeseries
package and the align
function.
But am not sure how to use aligned for my data which is a timeseries.
Example consider two timeseries arrays:
ts1=np.array([ 1311242821.0, 1311242882.0, 1311244025.0, 1311244145.0, 1311251330.0,
1311282555.0, 1311282614.0])
ts2=np.array([ 1311226761.0, 1311227001.0, 1311257033.0, 1311257094.0, 1311281265.0])
Output sample:
Now for ts2[2] (1311257033.0)
, its closest pair should be ts1[4] (1311251330.0)
because the difference is 5703.0
, which is within the threshold
, and it is the smallest. Now that ts2[2]
and ts1[4]
are already paired they should be left out of other calculations.
Such Pairs should be found, so the Output array might be longer than the actual arrays
abs(ts1[0]-ts2[0]) = 16060
abs(ts1[0]-ts2[1]) = 15820 //pair
abs(ts1[0]-ts2[2]) = 14212
abs(ts1[0]-ts2[3]) = 14273
abs(ts1[0]-ts2[4]) = 38444
abs(ts1[1]-ts2[0]) = 16121
abs(ts1[1]-ts2[1]) = 15881
abs(ts1[1]-ts2[2]) = 14151
abs(ts1[1]-ts2[3]) = 14212
abs(ts1[1]-ts2[4]) = 38383
abs(ts1[2]-ts2[0]) = 17264
abs(ts1[2]-ts2[1]) = 17024
abs(ts1[2]-ts2[2]) = 13008
abs(ts1[2]-ts2[3]) = 13069
abs(ts1[2]-ts2[4]) = 37240
abs(ts1[3]-ts2[0]) = 17384
abs(ts1[3]-ts2[1]) = 17144
abs(ts1[3]-ts2[2]) = 12888
abs(ts1[3]-ts2[3]) = 17144
abs(ts1[3]-ts2[4]) = 37120
abs(ts1[4]-ts2[0]) = 24569
abs(ts1[4]-ts2[1]) = 24329
abs(ts1[4]-ts2[2]) = 5703 //pair
abs(ts1[4]-ts2[3]) = 5764
abs(ts1[4]-ts2[4]) = 29935
abs(ts1[5]-ts2[0]) = 55794
abs(ts1[5]-ts2[1]) = 55554
abs(ts1[5]-ts2[2]) = 25522
abs(ts1[5]-ts2[3]) = 25461
abs(ts1[5]-ts2[4]) = 1290 //pair
abs(ts1[6]-ts2[0]) = 55853
abs(ts1[6]-ts2[1]) = 55613
abs(ts1[6]-ts2[2]) = 25581
abs(ts1[6]-ts2[3]) = 25520
abs(ts1[6]-ts2[4]) = 1349
So the pairs are: (ts1[0],ts2[1]), (ts1[4],ts2[2]), (ts1[5],ts2[4]
)
The rest of elements should have null
as their pair
The final two arrays will be of size 9.
Please let me know if this question is clear.
Solution using numpy Mask arrays
output aligned Timeseries(_ts1
, _ts2
).
The Result are 3 Pairs and only Pairs with Distance 1 can be used to align the Timeseries therfore Threshold=1.
def compute_diffs(threshold):
dtype = [('diff', int), ('ts1', int), ('ts2', int), ('threshold', int)]
diffs = np.empty((ts1.shape[0], ts2.shape[0]), dtype=dtype)
pairs = np.ma.make_mask_none(diffs.shape)
for i1, t1 in enumerate(ts1):
for i2, t2 in enumerate(ts2):
diffs[i1, i2] = (abs(t1 - t2), i1, i2, abs(i1-i2))
d1 = diffs[i1][diffs[i1]['threshold'] == threshold]
if d1.size == 1:
(diff, y, x, t) = d1[0]
pairs[y, x] = True
return diffs, pairs
def align_timeseries(diffs):
def _sync(ts, ts1, ts2, i1, i2, ii):
while i1 < i2:
ts1[ii] = ts[i1]; i1 +=1
ts2[ii] = DTNULL
ii += 1
return ii, i1
_ts1 = np.array([DTNULL]*9)
_ts2 = np.copy(_ts1)
ii = _i1 = _i2 = 0
for n, (diff, i1, i2, t) in enumerate(np.sort(diffs, order='ts1')):
ii, _i1 = _sync(ts1, _ts1, _ts2, _i1, i1, ii)
ii, _i2 = _sync(ts2, _ts2, _ts1, _i2, i2, ii)
if _i1 == i1:
_ts1[ii] = ts1[i1]; _i1 += 1
_ts2[ii] = ts2[i2]; _i2 += 1
ii += 1
ii, _i1 = _sync(ts1, _ts1, _ts2, _i1, ts1.size, ii)
return _ts1, _ts2
main:
diffs, pairs = compute_diffs(threshold=1)
print('diffs[pairs]:{}'.format(diffs[pairs]))
_ts1, _ts2 = align_timeseries(diffs[pairs])
pprint(ts1, ts2, _ts1, _ts2)
Output:
diffs[pairs]:[(15820, 0, 1) ( 5703, 4, 2) ( 1290, 5, 4)] ts1 ts2 _ts1 diff _ts2 0: 2011-07-21 12:07:01 2011-07-21 07:39:21 ---- -- -- -- -- -- ---- 2011-07-21 07:39:21 1: 2011-07-21 12:08:02 2011-07-21 07:43:21 2011-07-21 12:07:01 15820 2011-07-21 07:43:21 2: 2011-07-21 12:27:05 2011-07-21 16:03:53 2011-07-21 12:08:02 ---- ---- -- -- -- -- -- 3: 2011-07-21 12:29:05 2011-07-21 16:04:54 2011-07-21 12:27:05 ---- ---- -- -- -- -- -- 4: 2011-07-21 14:28:50 2011-07-21 22:47:45 2011-07-21 12:29:05 ---- ---- -- -- -- -- -- 5: 2011-07-21 23:09:15 ---- -- -- -- -- -- 2011-07-21 14:28:50 5703 2011-07-21 16:03:53 6: 2011-07-21 23:10:14 ---- -- -- -- -- -- ---- -- -- -- -- -- ---- 2011-07-21 16:04:54 7: ---- -- -- -- -- -- ---- -- -- -- -- -- 2011-07-21 23:09:15 1290 2011-07-21 22:47:45 8: ---- -- -- -- -- -- ---- -- -- -- -- -- 2011-07-21 23:10:14 ---- ---- -- -- -- -- --
Tested with Python: 3.4.2
I don't know what you mean with aligning timestamps. But you can use the time module to represent timestamps as floats or integers. In a first step you can convert any userformat to an array defined by time.struct_time
. In a second step you can convert this to seconds form start of the epoche. Then you have integervalues to do calculations with the timestamps.
How to convert user format using time.strptime()
is well explained in the docs:
>>> import time
>>> t = time.strptime("30 Nov 00", "%d %b %y")
>>> t
time.struct_time(tm_year=2000, tm_mon=11, tm_mday=30, tm_hour=0, tm_min=0,
tm_sec=0, tm_wday=3, tm_yday=335, tm_isdst=-1)
>>> time.mktime(t)
975538800.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With