Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to find length and start index of repeated elements in array

Tags:

python

numpy

I have an array A:

import numpy as np
A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] )

The length of consecutive '1s' would be:

output: [3, 2, 1]

with the corresponding starting indices:

idx = [2, 6, 10]

The original arrays are huge and I prefer a solution with less for-loop.

Edit (Run time):

import numpy as np
import time

A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] )

def LoopVersion(A):
    l_A = len(A)
    size = []
    idx = []
    temp_idx = []
    temp_size = []
    for i in range(l_A):
        if A[i] == 1:
            temp_size.append(1)
            if not temp_idx:
                temp_idx = i
                idx.append(temp_idx)
        else:
            size.append( len(temp_size) )
            size = [i for i in size if i != 0]
            temp_size = []
            temp_idx = []
    return size, idx

Quang's solution:

def UniqueVersion(A):
    _, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)
    return idx, counts

Jacco's solution:

def ConcatVersion(A):
    A = np.concatenate(([0], A, [0]))  #  get rid of some edge cases
    starts = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[::2]
    ends = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[1::2]
    len_of_repeats = ends - starts
    return starts, len_of_repeats

Dan's solution (works with special cases as well):

def structure(A):
    ZA = np.concatenate(([0], A, [0]))
    indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
    counts = indices[1:] - indices[:-1]
    return indices[::2], counts[::2]

Run time analysis with 10000 elements:

np.random.seed(1234)
B = np.random.randint(2, size=10000)


start = time.time()
size, idx = LoopVersion(B)
end = time.time()
print ( (end - start) )
# 0.32489800453186035 seconds

start = time.time()
idx, counts = UniqueVersion(B)
end = time.time()
print ( (end - start) )
# 0.008305072784423828 seconds

start = time.time()
idx, counts = ConcatVersion(B)
end = time.time()
print ( (end - start) )
# 0.0009801387786865234 seconds

start = time.time()
idx, counts = structure(B)
end = time.time()
print ( (end - start) )
# 0.000347137451171875 seconds
like image 215
HaraldKo Avatar asked Oct 08 '20 15:10

HaraldKo


People also ask

How to find the number of repeating elements in an array?

In the outer loop, pick elements one by one and count the number of occurrences of the picked element in the inner loop. This method doesn’t use the other useful data provided in questions like range of numbers is between 1 to n and there are only two repeating elements. Traverse the array once.

How to get the last index of an array in Java?

Then to get the last index (if that is what you want) use LastOrDefault (), as apose to ToArrray. Show activity on this post. You can have a class that implements IEnumerable and returns the indices you want:

How do you find the last occurrence of an array?

We traverse array from beginning to find first occurrence. If element is present, then we traverse from end also to find last occurrence. return "Only one key is present "."

How to find all distinct elements in an array efficiently?

Group the array elements by occurrence sequence Find longest subarray with 0 sum Longest consecutive subsequence Count distinct elements in every window of given size Check if two arrays are permutations of each other Find all distinct symmetric pairs in an array efficiently Check pair with given product in an array


3 Answers

Let's try unique:

_, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)

# your expected output:
idx, counts

Output:

(array([ 2,  6, 10]), array([3, 2, 1]))
like image 88
Quang Hoang Avatar answered Oct 21 '22 10:10

Quang Hoang


Here is a pedestrian try, solving the problem by programming the problem.

We prepend and also append a zero to A, getting a vector ZA, then detect the 1 islands, and the 0 islands coming in alternating manner in the ZA by comparing the shifted versions ZA[1:] and ZA[-1]. (In the constructed arrays we take the even places, corresponding to the ones in A.)

import numpy as np

def structure(A):
    ZA = np.concatenate(([0], A, [0]))
    indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
    counts = indices[1:] - indices[:-1]
    return indices[::2], counts[::2]

Some sample runs:

In [71]: structure(np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0] ))
Out[71]: (array([ 2,  6, 10]), array([3, 2, 1]))

In [72]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1] ))
Out[72]: (array([ 0,  5,  9, 13, 15]), array([3, 3, 2, 1, 1]))

In [73]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0] ))
Out[73]: (array([0, 5, 9]), array([3, 3, 2]))

In [74]: structure(np.array( [1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1] ))
Out[74]: (array([ 0,  2,  5,  7, 11, 14]), array([1, 2, 1, 3, 2, 3]))
like image 41
dan_fulea Avatar answered Oct 21 '22 09:10

dan_fulea


You can use the fact that the indexes of '1s' provide all information you need. It's enough to find starts and ends of series of '1s'.

A = np.concatenate(([0], A, [0]))  #  get rid of some edge cases
diff = np.argwhere((A[:-1] + A[1:]) == 1).ravel()
starts = diff[::2]
ends = diff[1::2]
    
print(starts, ends - starts)

like image 43
jaco2554 Avatar answered Oct 21 '22 10:10

jaco2554