Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compress list of numbers into unique non overlapping time ranges using python

I'm from biology and very new to python and ML, the lab has a blackbox ML model which outputs a sequence like this :

Predictions =
[1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0]  

each value represents a predicted time frame of duration 0.25seconds.
1 means High.
0 means Not High.

How do I convert these predictions into a [start,stop,label] ?
so that longer sequences are grouped example the first 10 ones represent 0 to 10*.25s thus the first range and label would be

[[0.0,2.5, High]
next there are 13 zeroes ===> start = (2.5), stop = 13*.25 +2.5, label = Not high
thus
[2.5, 5.75, Not-High]

so final list would be something like a list of lists/ranges with unique non overlapping intervals along with a label like :

[[0.0,2.5, High],
[2.5, 5.75, Not-High],
[5.75,6.50, High] ..

What I tried:
1. Count number of values in Predictions
2. Generate two ranges , one starting at zero and another starting at 0.25
3. merge these two lists into tuples

import numpy as np  
len_pred = len(Predictions) 
range_1 = np.arange(0,len_pred,0.25)
range_2 = np.arange(0.25,len_pred,0.25)
new_range = zip(range_1,range_2)  

Here I'm able to get the ranges, but missing out on the labels.
Seems like simple problem but I'm running in circles.

Please advise. Thanks.

like image 393
Seirra Avatar asked Dec 19 '22 01:12

Seirra


2 Answers

You can iterate through the list and create a range when you detect a change. You'll also need to account for the final range when using this method. Might not be super clean but should be effective.

current_time = 0
range_start = 0
current_value = predictions[0]
ranges = []
for p in predictions:
  if p != current_value:
    ranges.append([range_start, current_time, 'high' if current_value == 1 else 'not high'])
    range_start = current_time
    current_value = p
  current_time += .25
ranges.append([range_start, current_time, 'high' if current_value == 1 else 'not high'])

Updated to fix a few off by one type errors.

like image 160
Steve Avatar answered Dec 27 '22 02:12

Steve


by using diff() and where() you can find all the index that the value changed:

import numpy as np

p = np.array([1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0])

idx = np.r_[0, np.where(np.diff(p) != 0)[0]+1, len(p)]
t = idx * 0.25

np.c_[t[:-1], t[1:], p[idx[:-1]]]

output:

array([[  0.  ,   2.5 ,   1.  ],
       [  2.5 ,   5.75,   0.  ],
       [  5.75,   6.5 ,   1.  ],
       [  6.5 ,   6.75,   0.  ],
       [  6.75,   7.  ,   1.  ],
       [  7.  ,   7.25,   0.  ],
       [  7.25,   7.5 ,   1.  ],
       [  7.5 ,   7.75,   0.  ],
       [  7.75,   8.  ,   1.  ],
       [  8.  ,   8.25,   0.  ],
       [  8.25,   9.5 ,   1.  ],
       [  9.5 ,  10.25,   0.  ],
       [ 10.25,  11.75,   1.  ],
       [ 11.75,  12.  ,   0.  ]])
like image 23
HYRY Avatar answered Dec 27 '22 02:12

HYRY