I have some points in the interval [0,20]
I have a window of size window_size=3
that I can move inside the above interval. Therefore the beginning of the window - let's call start
is constrained to [0,17]
.
Let's say we have some points
below:
points = [1.4,1.8, 11.3,11.8,12.3,13.2, 18.2,18.3,18.4,18.5]
If we wanted a minimum of min_points=4
points the solution of the start ranges of the windows (which I found manually ) are:
suitable_starts = [[10.2,11.3],[15.5,17.0]]
i.e. The start of the size 3
window can be from 10.2
to 11.3
and from 15.5
to 17.0
. Trivially, the corresponding end of the windows would just be +3
of the start ranges.
I am looking for a way to algorithmically cover this quickly with clever numpy
or scipy
or other functionality.
The general function I'm looking for is:
get_start_windows(interval = [0,20],
window_size = 3.0,
points = [1.4,1.8,11.3,11.8,12.3,13.2,18.2,18.3,18.4,18.5],
min_points = 4
return suitable_starts # suitable_starts = [[10.2,11.3],[15.5,17.0]]
Note:
As someone in the comments has pointed out there are special cases sometimes when points are exactly window_size apart. However in reality the points are double floats where it is impossible for them to exactly window_size apart so these can be ignored.
These special examples include:
points = [1.4,1.8, 11.3,11.8,12.3,13.2,14.2,15.2,16.2,17.2,18.2,18.3,18.4,18.5]
but these can be safely ignored.
maximum() function is used to find the element-wise maximum of array elements. It compares two arrays and returns a new array containing the element-wise maxima.
NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.
After a bit of struggle I came up with this solution.
First a bit of explanations, and order of thoughts:
min_points
are in the window, and finish count when min_points
no longer inside it (imagine it as a convultion oprtator or so)min_points
, which means on every occurance of element or window_size
below it (as optional_starts
reflects)optional_starts
and sample the first time condition mets, and the last one that condition mets for each intervalso the following code was written as described above:
def consist_at_least(start, points, min_points, window_size):
a = [point for point in points if start <= point <= start + window_size]
return len(a)>=min_points
points = [1.4,1.8, 11.3,11.8,12.3,13.2, 18.2,18.3,18.4,18.5]
min_points = 4
window_size = 3
total_interval = [0,20]
optional_starts = points + [item-window_size for item in points if item-window_size>=total_interval[0]] + [total_interval[0] + window_size] + [total_interval[1] - window_size] + [total_interval[0]]
optional_starts = [item for item in optional_starts if item<=total_interval[1]-window_size]
intervals = []
potential_ends = []
for start in sorted(optional_starts):
is_start_interval = len(intervals)%2 == 0
if consist_at_least(start, points, min_points, window_size):
if is_start_interval:
intervals.append(start)
else:
potential_ends.append(start)
elif len(potential_ends)>0 :
intervals.append(potential_ends[-1])
potential_ends = []
if len(potential_ends)>0:
intervals.append(potential_ends[-1])
print(intervals)
output:
[10.2, 11.3, 15.5, 17]
Each 2 consequtive elements reflects start and end of interval
So, after additional information were given regarding the nature of the "intervals", I propose the following solution, which assumes inter-interval distances of at least window_size
:
import numpy as np
def get_start_windows(inter, ws, p, mp):
# Initialize list of suitable start ranges
start_ranges = []
# Determine possible intervals w.r.t. to window size
int_start = np.insert(np.array([0, p.shape[0]]), 1,
(np.argwhere(np.diff(p) > ws) + 1).squeeze()).tolist()
# Iterate found intervals
for i in np.arange(len(int_start)-1):
# The actual interval
int_ = p[int_start[i]:int_start[i+1]]
# If interval has less than minimum points, reject
if int_.shape[0] < mp:
continue
# Determine first and last possible starting point
first = max(inter[0], int_[mp-1] - ws)
last = min(int_[-mp], inter[1] - ws)
# Add to list of suitable start ranges
start_ranges.append((first, last))
return start_ranges
# Example 1
interval = [0, 20]
window_size = 3.0
min_points = 4
points = [1.4, 1.8, 11.3, 11.8, 12.3, 13.2, 18.2, 18.3, 18.4, 18.5]
print(get_start_windows(interval, window_size, np.array(points), min_points))
# Example 2
points = [1.4, 1.8, 1.9, 2.1, 11.3, 11.8, 12.3, 13.2, 18.2, 18.3, 18.4, 18.5]
print(get_start_windows(interval, window_size, np.array(points), min_points))
# Example 3
points = [1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 3.49]
print(get_start_windows(interval, window_size, np.array(points), min_points))
(Code might be optimized, I didn't pay attention to that...)
Output:
[(10.2, 11.3), (15.5, 17.0)]
[(0, 1.4), (10.2, 11.3), (15.5, 17.0)]
[(0, 1.9)]
Hopefully, the desired cases are covered by that solution.
-------------------------------------
System information
-------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.8.5
NumPy: 1.19.2
-------------------------------------
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With