Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of occurrences between markers in a python list

I have a boolean (numpy) array. And I want to count how many occurrences of 'True' are between the Falses.

Eg for a sample list:

b_List = [T,T,T,F,F,F,F,T,T,T,F,F,T,F] 

should produce

ml = [3,3,1]

my initial attempt was to try this snippet:

i = 0
ml = []
for el in b_List:
  if (b_List):
    i += 1
  ml.append(i)
  i = 0

But it keeps appending elements in ml for each F in the b_List.

EDIT

Thank you all for your answers. Sadly I can' accept all the answers as correct. I've accepted Akavall's answer because he referred to my initial attempt (I know what I did wrong now) and also did a comparison between the Mark's and Ashwinis posts.

Please don't take as a define answer the accepted solution, since both the other suggestions introduce alternative methods what work equally well

like image 221
user528025 Avatar asked Dec 04 '13 19:12

user528025


2 Answers

itertools.groupby provides one easy way to do this:

>>> import itertools
>>> T, F = True, False
>>> b_List = [T,T,T,F,F,F,F,T,T,T,F,F,T,F]
>>> [len(list(group)) for value, group in itertools.groupby(b_List) if value]
[3, 3, 1]
like image 101
Mark Dickinson Avatar answered Nov 14 '22 22:11

Mark Dickinson


Using NumPy:

>>> import numpy as np
>>> a = np.array([ True,  True,  True, False, False, False, False,  True,  True, True, False, False,  True, False], dtype=bool)
>>> np.diff(np.insert(np.where(np.diff(a)==1)[0]+1, 0, 0))[::2]
array([3, 3, 1])

>>> a = np.array([True, False, False, True, True, False, False, True, False])
>>> np.diff(np.insert(np.where(np.diff(a)==1)[0]+1, 0, 0))[::2]
array([1, 2, 1])

Can't say that this is the best NumPy solution, but it is still faster than itertools.groupby:

>>> lis = [ True,  True,  True, False, False, False, False,  True,  True, True, False, False,  True, False]*1000
>>> a = np.array(lis)
>>> %timeit [len(list(group)) for value, group in groupby(lis) if value]
100 loops, best of 3: 9.58 ms per loop
>>> %timeit np.diff(np.insert(np.where(np.diff(a)==1)[0]+1, 0, 0))[::2]
1000 loops, best of 3: 1.4 ms per loop

>>> lis = [ True,  True,  True, False, False, False, False,  True,  True, True, False, False,  True, False]*10000
>>> a = np.array(lis)
>>> %timeit [len(list(group)) for value, group in groupby(lis) if value]
1 loops, best of 3: 95.5 ms per loop
>>> %timeit np.diff(np.insert(np.where(np.diff(a)==1)[0]+1, 0, 0))[::2]
100 loops, best of 3: 14.9 ms per loop

As @justhalf and @Mark Dickinson pointed out in comments the above code will not work in some cases, so you need to append False on both ends first:

In [28]: a                                                                                        
Out[28]: 
array([ True,  True,  True, False, False, False, False,  True,  True,
        True, False, False,  True, False], dtype=bool)

In [29]: np.diff(np.where(np.diff(np.hstack([False, a, False])))[0])[::2]
Out[29]: array([3, 3, 1])
like image 34
Ashwini Chaudhary Avatar answered Nov 14 '22 22:11

Ashwini Chaudhary