Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding max number of consecutive elements using vectorization

As a part of my project I need to find if there are 4 or more consecutive elements in a vector and also their indices. currently I am using the following code:

#sample arrays:
#a1 = np.array([0, 1, 2, 3, 5])
#a2 = np.array([0, 1, 3, 4, 5, 6])
#a3 = np.array([0, 1, 3, 4, 5])
a4 = array([0, 1, 2, 4, 5, 6])

dd = np.diff(a4) #array([1, 1, 2, 1, 1])
c = 0
idx = []
for i in range(len(dd)):
    if dd[i]==1 and c<3:
        idx.append(i)
        c+=1
    elif dd[i]!=1 and c>=3:
        break
    else:
         c=0
         idx=[]

I am interested to see if it is possible to avoid for loop and just use numpy functions to do this task.

like image 318
Moj Avatar asked May 23 '26 04:05

Moj


2 Answers

This will give you an array with the length of all consecutive chunks:

np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) + ([len(a)-1],)))

Some tests:

>>> a = [1, 2, 3, 4, 5, 6, 9, 10, 11, 14, 17, 18, 19, 20, 21]
>>> np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) +
                           ([len(a)-1],)))
array([6, 3, 1, 5], dtype=int64)

>>> a = [0, 1, 2, 4, 5, 6]
>>> np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) +
                           ([len(a)-1],)))
array([3, 3], dtype=int64)

To check if any is at least 4 items long, sinply wrap the above code in np.any(... >= 4).


To see how this works, lets work out the results from the inside out for my first example:

>>> a = [1, 2, 3, 4, 5, 6, 9, 10, 11, 14, 17, 18, 19, 20, 21]

First, we figure out the deltas between consecutive items:

>>> np.diff(a)
array([1, 1, 1, 1, 1, 3, 1, 1, 3, 3, 1, 1, 1, 1])

Then, we identify positions where the delta is not 1, that is, positions where a chunk of consecutive items begins or ends:

>>> np.diff(a) != 1
array([False, False, False, False, False,  True, False, False,  True,
        True, False, False, False, False], dtype=bool)

And we extract the positions of the Trues:

>>> np.nonzero(np.diff(a) != 1)
(array([5, 8, 9], dtype=int64),)

The above index marks last items in a consecutive streak. Python slices are defined as start to last+1, so we could increment that array by one, add a zero at the beginning and the length of the array at the end and have all starting and ending indices of consecutive sequences, i.e.:

>>> np.concatenate(([0], np.nonzero(np.diff(a) != 1)[0] + 1, [len(a)]))
array([ 0,  6,  9, 10, 15], dtype=int64)

Taking the differences from consecutive indices will give us the desired length of each consecutive chunk. Because all we care for is the differences, rather than adding one to the indices, in my original answer I chose to to prepend -1 and append len(a)-1:

>>> np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) + ([len(a)-1],))
array([-1,  5,  8,  9, 14], dtype=int64)
>>> np.diff(np.concatenate(([-1],) + np.nonzero(np.diff(a) != 1) +
                           ([len(a)-1],)))
array([6, 3, 1, 5], dtype=int64)

Say that in this array you identify that you want the indices of the chunk of 5 items, the one in position 3 of this array. To recover the start and stop indices of that chunk you can simply do:

>>> np.concatenate(([0], np.nonzero(np.diff(a) != 1)[0] + 1, [len(a)]))[3:3+2]
array([10, 15], dtype=int64)
>>> a[10:15]
[17, 18, 19, 20, 21]
like image 172
Jaime Avatar answered May 24 '26 18:05

Jaime


How about the following recursive solution? (only works for 1dim arrays, of course)

I find it sickingly elegant. I'm not saying you should use it, but I had a good time coming up with it.

import numpy as np

def is_consecutive(arr, n):
    if n <= len(arr) <= 1:
        return True
    if len(arr) < n:
        return False
    diffs1idx = np.where(np.diff(arr) == 1)[0]
    return is_consecutive(diffs1idx, n-1)

print is_consecutive([1,2], 3)  # False
print is_consecutive([1,2,3], 3)  # True
print is_consecutive([5,1,2,3], 3)  # True
print is_consecutive([4,9,1,5,7], 3)  # False
print is_consecutive([4,9,1,2,3, 7, 9], 3)  # True
print is_consecutive(np.arange(100), 100)  # True
print is_consecutive(np.append([666], np.arange(100)), 100)  # True
print is_consecutive(np.append([666], np.arange(100)), 101)  # False

(Please don't ask me how it works... I don't understand recursion...)

like image 30
shx2 Avatar answered May 24 '26 17:05

shx2



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!