I have an array A: <pre class="prettyprint"><code>import numpy as np A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] ) </code></pre> The length of consecutive '1s' would be: <pre class="prettyprint"><code>output: [3, 2, 1] </code></pre> with the corresponding starting indices: <pre class="prettyprint"><code>idx = [2, 6, 10] </code></pre> The original arrays are huge and I prefer a solution with less for-loop. Edit (Run time): <pre class="prettyprint"><code>import numpy as np import time A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] ) def LoopVersion(A): l_A = len(A) size = [] idx = [] temp_idx = [] temp_size = [] for i in range(l_A): if A[i] == 1: temp_size.append(1) if not temp_idx: temp_idx = i idx.append(temp_idx) else: size.append( len(temp_size) ) size = [i for i in size if i != 0] temp_size = [] temp_idx = [] return size, idx </code></pre> Quang's solution: <pre class="prettyprint"><code>def UniqueVersion(A): _, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True) return idx, counts </code></pre> Jacco's solution: <pre class="prettyprint"><code>def ConcatVersion(A): A = np.concatenate(([0], A, [0])) # get rid of some edge cases starts = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[::2] ends = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[1::2] len_of_repeats = ends - starts return starts, len_of_repeats </code></pre> Dan's solution (works with special cases as well): <pre class="prettyprint"><code>def structure(A): ZA = np.concatenate(([0], A, [0])) indices = np.flatnonzero( ZA[1:] != ZA[:-1] ) counts = indices[1:] - indices[:-1] return indices[::2], counts[::2] </code></pre> Run time analysis with 10000 elements: <pre class="prettyprint"><code>np.random.seed(1234) B = np.random.randint(2, size=10000) start = time.time() size, idx = LoopVersion(B) end = time.time() print ( (end - start) ) # 0.32489800453186035 seconds start = time.time() idx, counts = UniqueVersion(B) end = time.time() print ( (end - start) ) # 0.008305072784423828 seconds start = time.time() idx, counts = ConcatVersion(B) end = time.time() print ( (end - start) ) # 0.0009801387786865234 seconds start = time.time() idx, counts = structure(B) end = time.time() print ( (end - start) ) # 0.000347137451171875 seconds </code></pre>

Here is a pedestrian try, solving the problem by programming the problem. We prepend and also append a zero to <code>A</code>, getting a vector <code>ZA</code>, then detect the <code>1</code> islands, and the <code>0</code> islands coming in alternating manner in the <code>ZA</code> by comparing the shifted versions <code>ZA[1:]</code> and <code>ZA[-1]</code>. (In the constructed arrays we take the even places, corresponding to the ones in <code>A</code>.) <pre class="prettyprint"><code>import numpy as np def structure(A): ZA = np.concatenate(([0], A, [0])) indices = np.flatnonzero( ZA[1:] != ZA[:-1] ) counts = indices[1:] - indices[:-1] return indices[::2], counts[::2] </code></pre> Some sample runs: <pre class="prettyprint"><code>In [71]: structure(np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0] )) Out[71]: (array([ 2, 6, 10]), array([3, 2, 1])) In [72]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1] )) Out[72]: (array([ 0, 5, 9, 13, 15]), array([3, 3, 2, 1, 1])) In [73]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0] )) Out[73]: (array([0, 5, 9]), array([3, 3, 2])) In [74]: structure(np.array( [1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1] )) Out[74]: (array([ 0, 2, 5, 7, 11, 14]), array([1, 2, 1, 3, 2, 3])) </code></pre>

You can use the fact that the indexes of '1s' provide all information you need. It's enough to find starts and ends of series of '1s'. <pre class="prettyprint"><code>A = np.concatenate(([0], A, [0])) # get rid of some edge cases diff = np.argwhere((A[:-1] + A[1:]) == 1).ravel() starts = diff[::2] ends = diff[1::2] print(starts, ends - starts) </code></pre>

Fast way to find length and start index of repeated elements in array

Tags:

python

numpy

I have an array A:

import numpy as np
A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] )

The length of consecutive '1s' would be:

output: [3, 2, 1]

with the corresponding starting indices:

idx = [2, 6, 10]

The original arrays are huge and I prefer a solution with less for-loop.

Edit (Run time):

import numpy as np
import time

A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] )

def LoopVersion(A):
    l_A = len(A)
    size = []
    idx = []
    temp_idx = []
    temp_size = []
    for i in range(l_A):
        if A[i] == 1:
            temp_size.append(1)
            if not temp_idx:
                temp_idx = i
                idx.append(temp_idx)
        else:
            size.append( len(temp_size) )
            size = [i for i in size if i != 0]
            temp_size = []
            temp_idx = []
    return size, idx

Quang's solution:

def UniqueVersion(A):
    _, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)
    return idx, counts

Jacco's solution:

def ConcatVersion(A):
    A = np.concatenate(([0], A, [0]))  #  get rid of some edge cases
    starts = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[::2]
    ends = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[1::2]
    len_of_repeats = ends - starts
    return starts, len_of_repeats

Dan's solution (works with special cases as well):

def structure(A):
    ZA = np.concatenate(([0], A, [0]))
    indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
    counts = indices[1:] - indices[:-1]
    return indices[::2], counts[::2]

Run time analysis with 10000 elements:

np.random.seed(1234)
B = np.random.randint(2, size=10000)


start = time.time()
size, idx = LoopVersion(B)
end = time.time()
print ( (end - start) )
# 0.32489800453186035 seconds

start = time.time()
idx, counts = UniqueVersion(B)
end = time.time()
print ( (end - start) )
# 0.008305072784423828 seconds

start = time.time()
idx, counts = ConcatVersion(B)
end = time.time()
print ( (end - start) )
# 0.0009801387786865234 seconds

start = time.time()
idx, counts = structure(B)
end = time.time()
print ( (end - start) )
# 0.000347137451171875 seconds

215

asked Oct 08 '20 15:10

HaraldKo

3 Answers

Let's try unique:

_, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)

# your expected output:
idx, counts

Output:

(array([ 2,  6, 10]), array([3, 2, 1]))

answered Oct 21 '22 10:10

Quang Hoang

Here is a pedestrian try, solving the problem by programming the problem.

We prepend and also append a zero to A, getting a vector ZA, then detect the 1 islands, and the 0 islands coming in alternating manner in the ZA by comparing the shifted versions ZA[1:] and ZA[-1]. (In the constructed arrays we take the even places, corresponding to the ones in A.)

import numpy as np

def structure(A):
    ZA = np.concatenate(([0], A, [0]))
    indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
    counts = indices[1:] - indices[:-1]
    return indices[::2], counts[::2]

Some sample runs:

In [71]: structure(np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0] ))
Out[71]: (array([ 2,  6, 10]), array([3, 2, 1]))

In [72]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1] ))
Out[72]: (array([ 0,  5,  9, 13, 15]), array([3, 3, 2, 1, 1]))

In [73]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0] ))
Out[73]: (array([0, 5, 9]), array([3, 3, 2]))

In [74]: structure(np.array( [1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1] ))
Out[74]: (array([ 0,  2,  5,  7, 11, 14]), array([1, 2, 1, 3, 2, 3]))

answered Oct 21 '22 09:10

dan_fulea

You can use the fact that the indexes of '1s' provide all information you need. It's enough to find starts and ends of series of '1s'.

A = np.concatenate(([0], A, [0]))  #  get rid of some edge cases
diff = np.argwhere((A[:-1] + A[1:]) == 1).ravel()
starts = diff[::2]
ends = diff[1::2]
    
print(starts, ends - starts)

answered Oct 21 '22 10:10

jaco2554

Related questions
                            
                                Chunking big datasets in PyRFC. Possible?
                            
                                Compile Python 3.6 script to standalone exe with Nuitka on Windows 10
                            
                                Open / load image as numpy ndarray directly
                            
                                BeautifulSoup: Return None if HTML element not found
                            
                                R equivalent of Python's dask
                            
                                Django ORM: window function with subsequent filtering
                            
                                How to manually close a websocket
                            
                                NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:snowflake
                            
                                Why Pearson correlation is different between Tensorflow and Scipy
                            
                                How to use the logging module in Python with gunicorn
                            
                                How do I monitor how busy a Python event loop is?
                            
                                Why does pipenv fail to install a package inside a docker container
                            
                                How can I limit user input length on python?
                            
                                How to disable a "Reload site? Changes you made may not be saved" popup for (python) selenium tests in chrome?
                            
                                How to fix "No matching distribution found for {package name}" when installing own package from test.pypi [duplicate]
                            
                                Type-Hinting Child class returning self
                            
                                Best way to Insert Python NumPy array into PostgreSQL database
                            
                                Use dictionary in tf.function input_signature in Tensorflow 2.0
                            
                                Methods for detecting a known shape/object in an image using OpenCV
                            
                                `super` in a `typing.NamedTuple` subclass fails in python 3.8

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With