Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy array set ones between two values, fast

having been looking for solution for this problem for a while but can't seem to find anything.

For example, I have an numpy array of

[ 0,  0,  2,  3,  2,  4,  3,  4,  0,  0, -2, -1, -4, -2, -1, -3, -4,  0,  2,  3, -2, -1,  0]

what I would like to achieve is the generate another array to indicate the elements between a pair of numbers, let's say between 2 and -2 here. So I want to get an array like this

[ 0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  0,  0,  0,  0,  0,  0,  0,  1,  1,  1,  0,  0]

Notice any 2 or -2 between a pair of (2, -2) are ignored. Any easy approach is to iterate through each element with for loop and identifies first occurrence of 2 and set everything after that to 1 until you hit an -2 and start look for the next 2 again.

But I would like this process to be faster as I have over 1000 elements in an numpy array. and this process needs to be done a lot of times. Do you guys know any elegant way to solve this? Thanks in advance!

like image 464
FiniteElement Avatar asked Feb 19 '16 15:02

FiniteElement


People also ask

How can I speed up my NumPy operation?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

How do you extract all numbers between a given range from a NumPy array?

Using the logical_and() method The logical_and() method from the numpy package accepts multiple conditions or expressions as a parameter. Each of the conditions or the expressions should return a boolean value. These boolean values are used to extract the required elements from the array.

What does .all do in NumPy?

all() in Python. The numpy. all() function tests whether all array elements along the mentioned axis evaluate to True.


2 Answers

Quite a problem that is! Listed in this post is a vectorized solution (hopefully the inlined comments would help to explain the logic behind it). I am assuming A as the input array with T1, T2 as the start and stop triggers.

def setones_between_triggers(A,T1,T2):    

    # Get start and stop indices corresponding to rising and falling triggers
    start = np.where(A==T1)[0]
    stop = np.where(A==T2)[0]

    # Take care of boundary conditions for np.searchsorted to work
    if (stop[-1] < start[-1]) & (start[-1] != A.size-1):
        stop = np.append(stop,A.size-1)

    # This is where the magic happens.
    # Validate (filter out) the triggers based on the set conditions :
    # 1. See if there are more than one stop indices between two start indices.
    # If so, use the first one and rejecting all others in that in-between space.
    # 2. Repeat the same check for start, but use the validated start indices.

    # First off, take care of out-of-bound cases for proper indexing
    stop_valid_idx = np.unique(np.searchsorted(stop,start,'right'))
    stop_valid_idx = stop_valid_idx[stop_valid_idx < stop.size]

    stop_valid = stop[stop_valid_idx]
    _,idx = np.unique(np.searchsorted(stop_valid,start,'left'),return_index=True)
    start_valid = start[idx]

    # Create shifts array (array filled with zeros, unless triggered by T1 and T2 
    # for which we have +1 and -1 as triggers). 
    shifts = np.zeros(A.size,dtype=int)
    shifts[start_valid] = 1
    shifts[stop_valid] = -1

    # Perform cumm. summation that would almost give us the desired output
    out = shifts.cumsum()

    # For a worst case when we have two groups of (T1,T2) adjacent to each other, 
    # set the negative trigger position as 1 as well
    out[stop_valid] = 1    
    return out

Sample runs

Original sample case :

In [1589]: A
Out[1589]: 
array([ 0,  0,  2,  3,  2,  4,  3,  4,  0,  0, -2, -1, -4, -2, -1, -3, -4,
        0,  2,  3, -2, -1,  0])

In [1590]: setones_between_triggers(A,2,-2)
Out[1590]: array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0])

Worst case #1 (adjacent (2,-2) groups) :

In [1595]: A
Out[1595]: 
array([-2,  2,  0,  2, -2,  2,  2,  2,  4, -2,  0, -2, -2, -4, -2, -1,  2,
       -4,  0,  2,  3, -2, -2,  0])

In [1596]: setones_between_triggers(A,2,-2)
Out[1596]: 
array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
       0], dtype=int32)

Worst case #2 (2 without any -2 till end) :

In [1603]: A
Out[1603]: 
array([-2,  2,  0,  2, -2,  2,  2,  2,  4, -2,  0, -2, -2, -4, -2, -1, -2,
       -4,  0,  2,  3,  5,  6,  0])

In [1604]: setones_between_triggers(A,2,-2)
Out[1604]: 
array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
       1], dtype=int32)
like image 125
Divakar Avatar answered Oct 02 '22 04:10

Divakar


Assuming you have got a huge dataset, I prefer to do a pair of initial searches for the two boundaries then use for-loop on these indices for validation.

def between_pairs(x, b1, b2):
    # output vector
    out = np.zeros_like(x)

    # reversed list of indices for possible rising and trailing edges
    rise_edges = list(np.argwhere(x==b1)[::-1,0])
    trail_edges = list(np.argwhere(x==b2)[::-1,0])

    # determine the rising trailing edge pairs
    rt_pairs = []
    t = None
    # look for the next rising edge after the previous trailing edge
    while rise_edges:
        r = rise_edges.pop()
        if t is not None and r < t:
            continue

        # look for the next trailing edge after previous rising edge
        while trail_edges:
            t = trail_edges.pop()
            if t > r:
                rt_pairs.append((r, t))
                break

    # use the rising, trailing pairs for updating d
    for rt in rt_pairs:
        out[rt[0]:rt[1]+1] = 1
    return out
# Example
a = np.array([0,  0,  2,  3,  2,  4,  3,  4,  0,  0, -2, -1, -4, -2, -1, -3, -4,
        0,  2,  3, -2, -1,  0])
d = between_pairs(a , 2, -2)
print repr(d)

## -- End pasted text --
array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0])

I did a speed comparison with the alternative answer given by @CactusWoman

def between_vals(x, val1, val2):
    out = np.zeros(x.shape, dtype = int)
    in_range = False
    for i, v in enumerate(x):
        if v == val1 and not in_range:
            in_range = True
        if in_range:
            out[i] = 1
        if v == val2 and in_range:
            in_range = False
    return out

I found the following

In [59]: a = np.random.choice(np.arange(-5, 6), 2000)

In [60]: %timeit between_vals(a, 2, -2)
1000 loops, best of 3: 681 µs per loop

In [61]: %timeit between_pairs(a, 2, -2)
1000 loops, best of 3: 182 µs per loop

and for a much smaller dataset,

In [72]: a = np.random.choice(np.arange(-5, 6), 50)

In [73]: %timeit between_vals(a, 2, -2)
10000 loops, best of 3: 17 µs per loop

In [74]: %timeit between_pairs(a, 2, -2)
10000 loops, best of 3: 34.7 µs per loop

Therefore it all depends on your dataset size.

like image 37
hashmuke Avatar answered Oct 02 '22 04:10

hashmuke