Consider the following numpy
code:
A[start:end] = B[mask]
Here:
A
and B
are 2D arrays with the same number of columns;start
and end
are scalars;mask
is a 1D boolean array;(end - start) == sum(mask)
.In principle, the above operation can be carried out using O(1)
temporary storage, by copying elements of B
directly into A
.
Is this what actually happens in practice, or does numpy
construct a temporary array for B[mask]
? If the latter, is there a way to avoid this by rewriting the statement?
The line
A[start:end] = B[mask]
will -- according to the Python language definition -- first evaluate the right hand side, yielding a new array containing the selected rows of B
and occupying additional memory. The most efficient pure-Python way I'm aware of to avoid this is to use an explicit loop:
from itertools import izip, compress
for i, b in izip(range(start, end), compress(B, mask)):
A[i] = b
Of course this will be much less time-efficient than your original code, but it only uses O(1) additional memory. Also note that itertools.compress()
is available in Python 2.7 or 3.1 or above.
Using boolean arrays as a index is fancy indexing, so numpy needs to make a copy. You could write a cython extension to deal with it, if you getting memory problems.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With