compress list of numbers into unique non overlapping time ranges using python

Question

I'm from biology and very new to python and ML, the lab has a blackbox ML model which outputs a sequence like this :

Predictions =
[1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0]

each value represents a predicted time frame of duration 0.25seconds.
1 means High.
0 means Not High.

How do I convert these predictions into a [start,stop,label] ?
so that longer sequences are grouped example the first 10 ones represent 0 to 10*.25s thus the first range and label would be

[[0.0,2.5, High]
next there are 13 zeroes ===> start = (2.5), stop = 13*.25 +2.5, label = Not high
thus
[2.5, 5.75, Not-High]

so final list would be something like a list of lists/ranges with unique non overlapping intervals along with a label like :

[[0.0,2.5, High],
[2.5, 5.75, Not-High],
[5.75,6.50, High] ..

What I tried:
1. Count number of values in Predictions
2. Generate two ranges , one starting at zero and another starting at 0.25
3. merge these two lists into tuples

import numpy as np  
len_pred = len(Predictions) 
range_1 = np.arange(0,len_pred,0.25)
range_2 = np.arange(0.25,len_pred,0.25)
new_range = zip(range_1,range_2)

Here I'm able to get the ranges, but missing out on the labels.
Seems like simple problem but I'm running in circles.

Please advise. Thanks.

Steve · Accepted Answer

You can iterate through the list and create a range when you detect a change. You'll also need to account for the final range when using this method. Might not be super clean but should be effective.

current_time = 0
range_start = 0
current_value = predictions[0]
ranges = []
for p in predictions:
  if p != current_value:
    ranges.append([range_start, current_time, 'high' if current_value == 1 else 'not high'])
    range_start = current_time
    current_value = p
  current_time += .25
ranges.append([range_start, current_time, 'high' if current_value == 1 else 'not high'])

Updated to fix a few off by one type errors.

HYRY · Answer

by using diff() and where() you can find all the index that the value changed:

import numpy as np

p = np.array([1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0])

idx = np.r_[0, np.where(np.diff(p) != 0)[0]+1, len(p)]
t = idx * 0.25

np.c_[t[:-1], t[1:], p[idx[:-1]]]

output:

array([[  0.  ,   2.5 ,   1.  ],
       [  2.5 ,   5.75,   0.  ],
       [  5.75,   6.5 ,   1.  ],
       [  6.5 ,   6.75,   0.  ],
       [  6.75,   7.  ,   1.  ],
       [  7.  ,   7.25,   0.  ],
       [  7.25,   7.5 ,   1.  ],
       [  7.5 ,   7.75,   0.  ],
       [  7.75,   8.  ,   1.  ],
       [  8.  ,   8.25,   0.  ],
       [  8.25,   9.5 ,   1.  ],
       [  9.5 ,  10.25,   0.  ],
       [ 10.25,  11.75,   1.  ],
       [ 11.75,  12.  ,   0.  ]])

compress list of numbers into unique non overlapping time ranges using python

Tags:

python

algorithm

numpy

python-2.7

Seirra

2 Answers

Steve

HYRY

Recent Activity

Donate For Us

compress list of numbers into unique non overlapping time ranges using python

Tags:

python

algorithm

numpy

python-2.7

Seirra

2 Answers

Steve

HYRY

Related questions

Recent Activity

Donate For Us