Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy finding interval which has a least k points

I have some points in the interval [0,20]

I have a window of size window_size=3 that I can move inside the above interval. Therefore the beginning of the window - let's call start is constrained to [0,17].

Let's say we have some points below:

points = [1.4,1.8,   11.3,11.8,12.3,13.2,  18.2,18.3,18.4,18.5]

If we wanted a minimum of min_points=4 points the solution of the start ranges of the windows (which I found manually ) are:

suitable_starts = [[10.2,11.3],[15.5,17.0]]

i.e. The start of the size 3 window can be from 10.2 to 11.3 and from 15.5 to 17.0. Trivially, the corresponding end of the windows would just be +3 of the start ranges.

I am looking for a way to algorithmically cover this quickly with clever numpy or scipy or other functionality.

The general function I'm looking for is:

get_start_windows(interval = [0,20],
    window_size = 3.0, 
    points = [1.4,1.8,11.3,11.8,12.3,13.2,18.2,18.3,18.4,18.5],
    min_points = 4

    return suitable_starts # suitable_starts = [[10.2,11.3],[15.5,17.0]]

Note:

As someone in the comments has pointed out there are special cases sometimes when points are exactly window_size apart. However in reality the points are double floats where it is impossible for them to exactly window_size apart so these can be ignored.

These special examples include:

points = [1.4,1.8,  11.3,11.8,12.3,13.2,14.2,15.2,16.2,17.2,18.2,18.3,18.4,18.5]

but these can be safely ignored.

like image 279
pinnochio Avatar asked Oct 02 '20 10:10

pinnochio


People also ask

Which function helps find the maximum value number in NumPy?

maximum() function is used to find the element-wise maximum of array elements. It compares two arrays and returns a new array containing the element-wise maxima.

Is NumPy more efficient than list?

NumPy Arrays are faster than Python Lists because of the following reasons: An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations.


Video Answer


2 Answers

After a bit of struggle I came up with this solution.

First a bit of explanations, and order of thoughts:

  • Ideally we would want to set a window size and slide it from the most left acceptable point until the most right acceptable point, and start counting when min_points are in the window, and finish count when min_points no longer inside it (imagine it as a convultion oprtator or so)
  • the basic pitfall is that we want to discrete the sliding, so the trick here is to check only when amount of points can fall under or up higher than min_points, which means on every occurance of element or window_size below it (as optional_starts reflects)
  • then to iterate over optional_starts and sample the first time condition mets, and the last one that condition mets for each interval

so the following code was written as described above:

def consist_at_least(start, points, min_points, window_size):
    a = [point for point in points if start <= point <= start + window_size]
    return len(a)>=min_points
    


points = [1.4,1.8,   11.3,11.8,12.3,13.2,  18.2,18.3,18.4,18.5]
min_points = 4
window_size = 3
total_interval = [0,20]
optional_starts = points + [item-window_size for item in points if item-window_size>=total_interval[0]] + [total_interval[0] + window_size] + [total_interval[1] - window_size] + [total_interval[0]]
optional_starts = [item for item in optional_starts if item<=total_interval[1]-window_size]
intervals = []
potential_ends = []
for start in sorted(optional_starts):
    is_start_interval = len(intervals)%2 == 0
    if consist_at_least(start, points, min_points, window_size):
        if is_start_interval:
            intervals.append(start)
        else:
            potential_ends.append(start)
    elif len(potential_ends)>0 :
        intervals.append(potential_ends[-1])
        potential_ends = []
if len(potential_ends)>0:
    intervals.append(potential_ends[-1])

print(intervals)

output:

[10.2, 11.3, 15.5, 17]

Each 2 consequtive elements reflects start and end of interval

like image 61
Yossi Levi Avatar answered Oct 19 '22 15:10

Yossi Levi


So, after additional information were given regarding the nature of the "intervals", I propose the following solution, which assumes inter-interval distances of at least window_size:

import numpy as np


def get_start_windows(inter, ws, p, mp):

    # Initialize list of suitable start ranges
    start_ranges = []

    # Determine possible intervals w.r.t. to window size
    int_start = np.insert(np.array([0, p.shape[0]]), 1,
                          (np.argwhere(np.diff(p) > ws) + 1).squeeze()).tolist()

    # Iterate found intervals
    for i in np.arange(len(int_start)-1):

        # The actual interval
        int_ = p[int_start[i]:int_start[i+1]]

        # If interval has less than minimum points, reject
        if int_.shape[0] < mp:
            continue

        # Determine first and last possible starting point
        first = max(inter[0], int_[mp-1] - ws)
        last = min(int_[-mp], inter[1] - ws)

        # Add to list of suitable start ranges
        start_ranges.append((first, last))

    return start_ranges


# Example 1
interval = [0, 20]
window_size = 3.0
min_points = 4
points = [1.4, 1.8, 11.3, 11.8, 12.3, 13.2, 18.2, 18.3, 18.4, 18.5]
print(get_start_windows(interval, window_size, np.array(points), min_points))

# Example 2
points = [1.4, 1.8, 1.9, 2.1, 11.3, 11.8, 12.3, 13.2, 18.2, 18.3, 18.4, 18.5]
print(get_start_windows(interval, window_size, np.array(points), min_points))

# Example 3
points = [1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 3.49]
print(get_start_windows(interval, window_size, np.array(points), min_points))

(Code might be optimized, I didn't pay attention to that...)

Output:

[(10.2, 11.3), (15.5, 17.0)]
[(0, 1.4), (10.2, 11.3), (15.5, 17.0)]
[(0, 1.9)]

Hopefully, the desired cases are covered by that solution.

-------------------------------------
System information
-------------------------------------
Platform:   Windows-10-10.0.16299-SP0
Python:     3.8.5
NumPy:      1.19.2
-------------------------------------
like image 2
HansHirse Avatar answered Oct 19 '22 15:10

HansHirse