Setup
Consider the numpy array a
>>> np.random.seed([3,1415])
>>> a = np.random.choice([True, False], (4, 8))
>>> a
array([[ True, False, True, False, True, True, False, True],
[False, False, False, False, True, False, False, True],
[False, True, True, True, True, True, True, True],
[ True, True, True, False, True, False, False, False]], dtype=bool)
Question
For each column, I want to determine the cumulative equivalent for all.
The result should look like this:
array([[ True, False, True, False, True, True, False, True],
[False, False, False, False, True, False, False, True],
[False, False, False, False, True, False, False, True],
[False, False, False, False, True, False, False, False]], dtype=bool)
Take the first column
a[: 0]
# Original First Column
array([ True, False, False, True], dtype=bool)
# So far so good
# \ False from here on
# | /---------------\
array([ True, False, False, False], dtype=bool)
# Cumulative all
So basically, cumulative all is True
as long as we have True
and turns False
from then on at the first False
What I have tried
I can get the result with
a.cumprod(0).astype(bool)
But, I can't help but wonder if its necessary to perform each and every multiplication when I know everything will be False
from the first False
I see.
Consider the larger 1-D array
b = np.array(list('111111111110010101010101010101010101010011001010101010101')).astype(int).astype(bool)
I contend that these two produce the same answer
bool(b.prod())
and
b.all()
But b.all()
can short circuit while b.prod()
does not. If I time them:
%timeit bool(b.prod())
%timeit b.all()
100000 loops, best of 3: 2.05 µs per loop
1000000 loops, best of 3: 1.45 µs per loop
b.all()
is quicker. This implies that there must me a way to conduct a cumulative all that is quicker that my a.cumprod(0).astype(bool)
=IF(A3=A2, B3+D2, B3) Then drag this formula down to every remaining cell in column D: The result is a column that displays the cumulative sales grouped by each date. For example: The cumulative sales for 1/1/2022 is: 5, 8, 15, 27.
All ufuncs have 5 methods: reduce
, accumulate
, reduceat
, outer
, and at
. In this case, use accumulate
since it returns the result of cumulative applications of the ufunc:
In [41]: np.logical_and.accumulate(a, axis=0)
Out[50]:
array([[ True, False, True, False, True, True, False, True],
[False, False, False, False, True, False, False, True],
[False, False, False, False, True, False, False, True],
[False, False, False, False, True, False, False, False]], dtype=bool)
In [60]: np.random.seed([3,1415])
In [61]: a = np.random.choice([True, False], (400, 80))
In [57]: %timeit np.logical_and.accumulate(a, axis=0)
10000 loops, best of 3: 85.6 µs per loop
In [59]: %timeit a.cumprod(0).astype(bool)
10000 loops, best of 3: 138 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With