Fast, python-ish way of ranking chunks of 1's in numpy array?

Question

I have a numpy array consisting of 0's and 1's. Each sequence of 1's within the array stands for occurrence of one event. I want to label elements corresponding to an event with event-specific ID number (and the rest of array elements with np.nan) I surely can do that in a loop, but is there more "python-ish" (fast, vectorized) way of doing it?

Example of numpy array with 3 events I want to label.

import numpy as np 
arr = np.array([0,0,0,1,1,1,0,0,0,1,1,0,0,0,1,1,1,1])
some_func(arr)

# Expected output of some_func I search for: 
# [np.nan,np.nan,np.nan,0,0,0,np.nan,np.nan,np.nan,1,1,np.nan,np.nan,np.nan,2,2,2,2]

Divakar · Accepted Answer

You want to label and luckily, there's one with SciPy, scipy.ndimage.label -

In [43]: from scipy.ndimage import label

In [47]: out = label(arr)[0]

In [48]: np.where(arr==0,np.nan,out-1)
Out[48]: 
array([nan, nan, nan,  0.,  0.,  0., nan, nan, nan,  1.,  1., nan, nan,
       nan,  2.,  2.,  2.,  2.])

Another with some NumPy work -

def rank_chunks(arr):
    m = np.r_[False,arr.astype(bool)]
    idx = np.flatnonzero(m[:-1] < m[1:])
    id_ar = np.zeros(len(arr),dtype=float)
    id_ar[idx[1:]] = 1
    out = id_ar.cumsum()
    out[arr==0] = np.nan
    return out

Another with masking + np.repeat -

def rank_chunks_v2(arr):
    m = np.r_[False,arr.astype(bool),False]
    idx = np.flatnonzero(m[:-1] != m[1:])
    l = idx[1::2]-idx[::2]
    out = np.full(len(arr),np.nan,dtype=float)
    out[arr!=0] = np.repeat(np.arange(len(l)),l)
    return out

Timings (tiling given input to 1Mx) -

In [153]: arr_big = np.tile(arr,1000000)

In [154]: %timeit np.where(arr_big==0,np.nan,label(arr_big)[0]-1)
     ...: %timeit rank_chunks(arr_big)
     ...: %timeit rank_chunks_v2(arr_big)
1 loop, best of 3: 312 ms per loop
1 loop, best of 3: 263 ms per loop
1 loop, best of 3: 229 ms per loop

Laurens Koppenol · Answer

A really cool way to do this is using the DBSCAN clustering algorithm. It might not be the most efficient for this specific task, BUT it is resilient in case you want to give a minimum number of 1's per event, or allow a gap of a certain number of zeroes within an event.

from sklearn.cluster import DBSCAN
import numpy as np

max_gap = 1
min_samples = 1

# Get indices of every element that belongs to a certain event
input_values = np.array([0,0,0,1,1,1,0,0,0,1,1,0,0,0,1,1,1,1])  
positives_indices = np.where(input_values > 0)[0]

# Turn the indices into a 2D array of so called 'examples'
X = positives_indices.reshape(-1, 1)

# Train a model and transform the data in one
clustering = DBSCAN(eps=max_gap, min_samples=min_samples) \
    .fit_predict(X)

# Get results, yields (index, event_id)
zip(X, clustering)

Fast, python-ish way of ranking chunks of 1's in numpy array?

Tags:

python

arrays

list-comprehension

numpy

Marta Karas

2 Answers

Divakar

Laurens Koppenol

Recent Activity

Donate For Us

Fast, python-ish way of ranking chunks of 1's in numpy array?

Tags:

python

arrays

list-comprehension

numpy

Marta Karas

2 Answers

Divakar

Laurens Koppenol

Related questions

Recent Activity

Donate For Us