numpy array set ones between two values, fast

Tags:

having been looking for solution for this problem for a while but can't seem to find anything.

For example, I have an numpy array of

[ 0,  0,  2,  3,  2,  4,  3,  4,  0,  0, -2, -1, -4, -2, -1, -3, -4,  0,  2,  3, -2, -1,  0]

what I would like to achieve is the generate another array to indicate the elements between a pair of numbers, let's say between 2 and -2 here. So I want to get an array like this

[ 0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  0,  0,  0,  0,  0,  0,  0,  1,  1,  1,  0,  0]

Notice any 2 or -2 between a pair of (2, -2) are ignored. Any easy approach is to iterate through each element with for loop and identifies first occurrence of 2 and set everything after that to 1 until you hit an -2 and start look for the next 2 again.

But I would like this process to be faster as I have over 1000 elements in an numpy array. and this process needs to be done a lot of times. Do you guys know any elegant way to solve this? Thanks in advance!

464

asked Feb 19 '16 15:02

FiniteElement

2 Answers

Quite a problem that is! Listed in this post is a vectorized solution (hopefully the inlined comments would help to explain the logic behind it). I am assuming A as the input array with T1, T2 as the start and stop triggers.

def setones_between_triggers(A,T1,T2):    

    # Get start and stop indices corresponding to rising and falling triggers
    start = np.where(A==T1)[0]
    stop = np.where(A==T2)[0]

    # Take care of boundary conditions for np.searchsorted to work
    if (stop[-1] < start[-1]) & (start[-1] != A.size-1):
        stop = np.append(stop,A.size-1)

    # This is where the magic happens.
    # Validate (filter out) the triggers based on the set conditions :
    # 1. See if there are more than one stop indices between two start indices.
    # If so, use the first one and rejecting all others in that in-between space.
    # 2. Repeat the same check for start, but use the validated start indices.

    # First off, take care of out-of-bound cases for proper indexing
    stop_valid_idx = np.unique(np.searchsorted(stop,start,'right'))
    stop_valid_idx = stop_valid_idx[stop_valid_idx < stop.size]

    stop_valid = stop[stop_valid_idx]
    _,idx = np.unique(np.searchsorted(stop_valid,start,'left'),return_index=True)
    start_valid = start[idx]

    # Create shifts array (array filled with zeros, unless triggered by T1 and T2 
    # for which we have +1 and -1 as triggers). 
    shifts = np.zeros(A.size,dtype=int)
    shifts[start_valid] = 1
    shifts[stop_valid] = -1

    # Perform cumm. summation that would almost give us the desired output
    out = shifts.cumsum()

    # For a worst case when we have two groups of (T1,T2) adjacent to each other, 
    # set the negative trigger position as 1 as well
    out[stop_valid] = 1    
    return out

Sample runs

Original sample case :

In [1589]: A
Out[1589]: 
array([ 0,  0,  2,  3,  2,  4,  3,  4,  0,  0, -2, -1, -4, -2, -1, -3, -4,
        0,  2,  3, -2, -1,  0])

In [1590]: setones_between_triggers(A,2,-2)
Out[1590]: array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0])

Worst case #1 (adjacent (2,-2) groups) :

In [1595]: A
Out[1595]: 
array([-2,  2,  0,  2, -2,  2,  2,  2,  4, -2,  0, -2, -2, -4, -2, -1,  2,
       -4,  0,  2,  3, -2, -2,  0])

In [1596]: setones_between_triggers(A,2,-2)
Out[1596]: 
array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
       0], dtype=int32)

Worst case #2 (2 without any -2 till end) :

In [1603]: A
Out[1603]: 
array([-2,  2,  0,  2, -2,  2,  2,  2,  4, -2,  0, -2, -2, -4, -2, -1, -2,
       -4,  0,  2,  3,  5,  6,  0])

In [1604]: setones_between_triggers(A,2,-2)
Out[1604]: 
array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
       1], dtype=int32)

125

answered Oct 02 '22 04:10

Divakar

Assuming you have got a huge dataset, I prefer to do a pair of initial searches for the two boundaries then use for-loop on these indices for validation.

def between_pairs(x, b1, b2):
    # output vector
    out = np.zeros_like(x)

    # reversed list of indices for possible rising and trailing edges
    rise_edges = list(np.argwhere(x==b1)[::-1,0])
    trail_edges = list(np.argwhere(x==b2)[::-1,0])

    # determine the rising trailing edge pairs
    rt_pairs = []
    t = None
    # look for the next rising edge after the previous trailing edge
    while rise_edges:
        r = rise_edges.pop()
        if t is not None and r < t:
            continue

        # look for the next trailing edge after previous rising edge
        while trail_edges:
            t = trail_edges.pop()
            if t > r:
                rt_pairs.append((r, t))
                break

    # use the rising, trailing pairs for updating d
    for rt in rt_pairs:
        out[rt[0]:rt[1]+1] = 1
    return out
# Example
a = np.array([0,  0,  2,  3,  2,  4,  3,  4,  0,  0, -2, -1, -4, -2, -1, -3, -4,
        0,  2,  3, -2, -1,  0])
d = between_pairs(a , 2, -2)
print repr(d)

## -- End pasted text --
array([0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0])

I did a speed comparison with the alternative answer given by @CactusWoman

def between_vals(x, val1, val2):
    out = np.zeros(x.shape, dtype = int)
    in_range = False
    for i, v in enumerate(x):
        if v == val1 and not in_range:
            in_range = True
        if in_range:
            out[i] = 1
        if v == val2 and in_range:
            in_range = False
    return out

I found the following

In [59]: a = np.random.choice(np.arange(-5, 6), 2000)

In [60]: %timeit between_vals(a, 2, -2)
1000 loops, best of 3: 681 µs per loop

In [61]: %timeit between_pairs(a, 2, -2)
1000 loops, best of 3: 182 µs per loop

and for a much smaller dataset,

In [72]: a = np.random.choice(np.arange(-5, 6), 50)

In [73]: %timeit between_vals(a, 2, -2)
10000 loops, best of 3: 17 µs per loop

In [74]: %timeit between_pairs(a, 2, -2)
10000 loops, best of 3: 34.7 µs per loop

Therefore it all depends on your dataset size.

answered Oct 02 '22 04:10

hashmuke

Related questions
                            
                                monte carlo simulation of protein structure and grid
                            
                                Finding which packages support Python 3.x vs 2.7.x
                            
                                Python cannot import module from subdirectory even with a file named __init.py__ in the directory
                            
                                Python - delete blank lines of text at the end of the file
                            
                                How to find the breakpoint numbers in pdb (ipdb)?
                            
                                Use Python to send keystrokes to games in Windows?
                            
                                SQLAlchemy session error: InvalidRequestError
                            
                                Match rows in one Pandas dataframe to another based on three columns
                            
                                How to document Python packages using Sphinx
                            
                                Using multiprocessing pool from celery task raises exception
                            
                                How to apply function to dataframe in place
                            
                                Raise to 1/3 gives complex number
                            
                                installing PyGObject via pip in virtualenv [duplicate]
                            
                                How can I pass null to an external library, using ctypes, with an argument declared with ctypeslib.ndpointer?
                            
                                Zigzag or wavy lines in matplotlib
                            
                                Pycharm: terminate all running processes
                            
                                Correct way to declare an image field, sqlalchemy
                            
                                Why does the OrderedDict keys view compare order-insensitive?
                            
                                Using InitSpider with splash: only parsing the login page?
                            
                                Why attribute lookup in Python is designed this way (precedence chain)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

numpy array set ones between two values, fast

Tags:

performance

python

arrays

numpy

FiniteElement

People also ask

2 Answers

Divakar

hashmuke

Recent Activity

Donate For Us