I'm from biology and very new to python and ML, the lab has a blackbox ML model which outputs a sequence like this :
Predictions =
[1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0]
each value represents a predicted time frame of duration 0.25seconds.
1 means High.
0 means Not High.
How do I convert these predictions into a [start,stop,label] ?
so that longer sequences are grouped example the first 10 ones represent 0 to 10*.25s thus the first range and label would be
[[0.0,2.5, High]
next there are 13 zeroes ===> start = (2.5), stop = 13*.25 +2.5, label = Not high
thus
[2.5, 5.75, Not-High]
so final list would be something like a list of lists/ranges with unique non overlapping intervals along with a label like :
[[0.0,2.5, High],
[2.5, 5.75, Not-High],
[5.75,6.50, High] ..
What I tried:
1. Count number of values in Predictions
2. Generate two ranges , one starting at zero and another starting at 0.25
3. merge these two lists into tuples
import numpy as np
len_pred = len(Predictions)
range_1 = np.arange(0,len_pred,0.25)
range_2 = np.arange(0.25,len_pred,0.25)
new_range = zip(range_1,range_2)
Here I'm able to get the ranges, but missing out on the labels.
Seems like simple problem but I'm running in circles.
Please advise. Thanks.
You can iterate through the list and create a range when you detect a change. You'll also need to account for the final range when using this method. Might not be super clean but should be effective.
current_time = 0
range_start = 0
current_value = predictions[0]
ranges = []
for p in predictions:
if p != current_value:
ranges.append([range_start, current_time, 'high' if current_value == 1 else 'not high'])
range_start = current_time
current_value = p
current_time += .25
ranges.append([range_start, current_time, 'high' if current_value == 1 else 'not high'])
Updated to fix a few off by one type errors.
by using diff()
and where()
you can find all the index that the value changed:
import numpy as np
p = np.array([1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0])
idx = np.r_[0, np.where(np.diff(p) != 0)[0]+1, len(p)]
t = idx * 0.25
np.c_[t[:-1], t[1:], p[idx[:-1]]]
output:
array([[ 0. , 2.5 , 1. ],
[ 2.5 , 5.75, 0. ],
[ 5.75, 6.5 , 1. ],
[ 6.5 , 6.75, 0. ],
[ 6.75, 7. , 1. ],
[ 7. , 7.25, 0. ],
[ 7.25, 7.5 , 1. ],
[ 7.5 , 7.75, 0. ],
[ 7.75, 8. , 1. ],
[ 8. , 8.25, 0. ],
[ 8.25, 9.5 , 1. ],
[ 9.5 , 10.25, 0. ],
[ 10.25, 11.75, 1. ],
[ 11.75, 12. , 0. ]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With