Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I identify sequences of values in a boolean array?

I have a long boolean array:

bool_array = [ True, True, True, True, True, False, False, False, False, False, True, True, True, False, False, True, True, True, True, False, False, False, False, False, False, False ]

I need to figure out where the values flips, i.e., the addresses where sequences of True and False begin. In this particular case, I would want to get

index = [0, 5, 10, 13, 15, 19, 26]

Is there an easy way to do without manually looping to check every ith element with the (i+1)th?

like image 894
saud Avatar asked Apr 27 '16 15:04

saud


People also ask

How do you use a boolean array?

The boolean array can be used to store boolean datatype values only and the default value of the boolean array is false. An array of booleans are initialized to false and arrays of reference types are initialized to null. In some cases, we need to initialize all values of the boolean array with true or false.

How do you count true values in an array?

To count the true values in an array:Check if each value is equal to true and return the result. Access the length property on the array to get the count of the true values.

What is an array of boolean?

A Boolean array in computer programming is a sequence of values that can only hold the values of true or false. By definition, a Boolean can only be true or false and is unable to hold any other intermediary value. An array is a sequence of data types that occupy numerical positions in a linear memory space.


2 Answers

As a more efficient approach for large datasets, in python 3.X you can use accumulate and groupby function from itertools module.

>>> from itertools import accumulate, groupby
>>> [0] + list(accumulate(sum(1 for _ in g) for _,g in groupby(bool_array)))
[0, 5, 10, 13, 15, 19, 26]

The logic behind the code:

This code, categorizes the sequential duplicate items using groupby() function, then loops over the iterator returned by groupby() which is contains pairs of keys (that we escaped it using under line instead of a throw away variable) and these categorized iterators.

>>> [list(g) for _, g in groupby(bool_array)]
[[True, True, True, True, True], [False, False, False, False, False], [True, True, True], [False, False], [True, True, True, True], [False, False, False, False, False, False, False]]

So all we need is calculating the length of these iterators and sum each length with its previous length, in order to get the index of first item which is exactly where the item is changed, that is exactly what that accumulate() function is for.

In Numpy you can use the following approach:

In [19]: np.where(arr[1:] - arr[:-1])[0] + 1
Out[19]: array([ 5, 10, 13, 15, 19])
# With leading and trailing indices
In [22]: np.concatenate(([0], np.where(arr[1:] - arr[:-1])[0] + 1, [arr.size]))
Out[22]: array([ 0,  5, 10, 13, 15, 19, 26])
like image 197
Mazdak Avatar answered Sep 27 '22 23:09

Mazdak


This will tell you where:

>>> import numpy as np
>>> np.argwhere(np.diff(bool_array)).squeeze()
array([ 4,  9, 12, 14, 18])

np.diff calculates the difference between each element and the next. For booleans, it essentially interprets the values as integers (0: False, non-zero: True), so differences appear as +1 or -1 values, which then get mapped back to booleans (True when there is a change).

The np.argwhere function then tells you where the values are True --- which are now the changes.

like image 43
DilithiumMatrix Avatar answered Sep 28 '22 00:09

DilithiumMatrix