Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count of islands of negative and positive numbers in a NumPy array

I have an array containing chunks of negative and chunks of positive elements. A much simplified example of it would be an array a looking like: array([-3, -2, -1, 1, 2, 3, 4, 5, 6, -5, -4])

(a<0).sum() and (a>0).sum() give me the total number of negative and positive elements but how do I count these in order? By this I mean I want to know that my array contains first 3 negative elements, 6 positive and 2 negative.

This sounds like a topic that have been addressed somewhere, and there may be a duplicate out there, but I can't find one.

A method is to use numpy.roll(a,1) in a loop over the whole array and count the number of elements of a given sign appearing in e.g. the first element of the array as it rolls, but it doesn't look much numpyic (or pythonic) nor very efficient to me.

like image 449
calocedrus Avatar asked Oct 27 '25 20:10

calocedrus


1 Answers

Here's one vectorized approach -

def pos_neg_counts(a):
    mask = a>0
    idx = np.flatnonzero(mask[1:] != mask[:-1])
    count = np.concatenate(( [idx[0]+1], idx[1:] - idx[:-1], [a.size-1-idx[-1]] ))
    if a[0]<0:
        return count[1::2], count[::2] # pos, neg counts
    else:
        return count[::2], count[1::2] # pos, neg counts

Sample runs -

In [155]: a
Out[155]: array([-3, -2, -1,  1,  2,  3,  4,  5,  6, -5, -4])

In [156]: pos_neg_counts(a)
Out[156]: (array([6]), array([3, 2]))

In [157]: a[0] = 3

In [158]: a
Out[158]: array([ 3, -2, -1,  1,  2,  3,  4,  5,  6, -5, -4])

In [159]: pos_neg_counts(a)
Out[159]: (array([1, 6]), array([2, 2]))

In [160]: a[-1] = 7

In [161]: a
Out[161]: array([ 3, -2, -1,  1,  2,  3,  4,  5,  6, -5,  7])

In [162]: pos_neg_counts(a)
Out[162]: (array([1, 6, 1]), array([2, 1]))

Runtime test

Other approach(es) -

# @Franz's soln        
def split_app(my_array):
    negative_index = my_array<0
    splits = np.split(negative_index, np.where(np.diff(negative_index))[0]+1)
    len_list = [len(i) for i in splits]
    return len_list

Timings on bigger dataset -

In [20]: # Setup input array
    ...: reps = np.random.randint(3,10,(100000))
    ...: signs = np.ones(len(reps),dtype=int)
    ...: signs[::2] = -1
    ...: a = np.repeat(signs, reps)*np.random.randint(1,9,reps.sum())
    ...: 

In [21]: %timeit split_app(a)
10 loops, best of 3: 90.4 ms per loop

In [22]: %timeit pos_neg_counts(a)
100 loops, best of 3: 2.21 ms per loop
like image 179
Divakar Avatar answered Oct 30 '25 12:10

Divakar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!