numpy: boolean indexing and memory usage

Question

Consider the following numpy code:

A[start:end] = B[mask]

Here:

A and B are 2D arrays with the same number of columns;
start and end are scalars;
mask is a 1D boolean array;
(end - start) == sum(mask).

In principle, the above operation can be carried out using O(1) temporary storage, by copying elements of B directly into A.

Is this what actually happens in practice, or does numpy construct a temporary array for B[mask]? If the latter, is there a way to avoid this by rewriting the statement?

Sven Marnach · Accepted Answer

The line

A[start:end] = B[mask]

will -- according to the Python language definition -- first evaluate the right hand side, yielding a new array containing the selected rows of B and occupying additional memory. The most efficient pure-Python way I'm aware of to avoid this is to use an explicit loop:

from itertools import izip, compress
for i, b in izip(range(start, end), compress(B, mask)):
    A[i] = b

Of course this will be much less time-efficient than your original code, but it only uses O(1) additional memory. Also note that itertools.compress() is available in Python 2.7 or 3.1 or above.

tillsten · Answer

Using boolean arrays as a index is fancy indexing, so numpy needs to make a copy. You could write a cython extension to deal with it, if you getting memory problems.

numpy: boolean indexing and memory usage

Tags:

python

memory-management

large-data

numpy

NPE

2 Answers

Sven Marnach

tillsten

Recent Activity

Donate For Us

numpy: boolean indexing and memory usage

Tags:

python

memory-management

large-data

numpy

NPE

2 Answers

Sven Marnach

tillsten

Related questions

Recent Activity

Donate For Us