Suppose I have a NumPy array arr
that I want to element-wise filter (reduce) depending on the truth value of a (broadcastable) function, e.g.
I want to get only values below a certain threshold value k
:
def cond(x):
return x < k
There are a couple of methods, e.g.:
np.fromiter((x for x in arr if cond(x)), dtype=arr.dtype)
(which is a memory efficient version of using a list comprehension: np.array([x for x in arr if cond(x)])
because np.fromiter()
will produce a NumPy array directly, without the need to allocate an intermediate Python list
)arr[cond(arr)]
arr[np.nonzero(cond(arr))]
(or equivalently using np.where()
as it defaults to np.nonzero()
with only one condition)(The last two approaches to be accelerated with Cython or Numba)
Which is the fastest? What about memory efficiency?
(EDITED: To use directly np.nonzero()
instead of np.where()
as per @ShadowRanger comment)
Using a loop-based approach with single pass and copying, accelerated with Numba, offers the best overall trade-off in terms of speed, memory efficiency and flexibility.
If the execution of the condition function is sufficiently fast, two-passes (filter2_nb()
) may be faster, while they are more memory efficient regardless.
Also, for sufficiently large inputs, resizing instead of copying (filter_resize_xnb()
) leads to faster execution.
If the data type (and the condition function) is known ahead of time and can be compiled, the Cython acceleration seems to be faster. It is likely that a similar hard-coding of the condition would lead to comparable speed-up with Numba accerelation as well.
When it comes to NumPy-only based approaches, boolean masking or integer indexing are of comparable speed, and which one comes out faster depends largely on the filtering factor, i.e. the portion of values that passes through the filtering condition.
The np.fromiter()
approach is much slower (it would be off-charts in the plot), but does not produce large temporary objects.
Note that the following tests are meant to give some insights into the different approaches and should be taken with a grain of salt. The most relevant assumptions are that the condition is broadcastable and that it would eventually compute very fast.
def filter_fromiter(arr, cond):
return np.fromiter((x for x in arr if cond(x)), dtype=arr.dtype)
def filter_mask(arr, cond):
return arr[cond(arr)]
def filter_idx(arr, cond):
return arr[np.nonzero(cond(arr))]
4a. Using explicit looping, with single pass and final copying/resizing
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
import numpy as np
cdef long NUM = 1048576
cdef long MAX_VAL = 1048576
cdef long K = 1048576 // 2
cdef int cond_cy(long x, long k=K):
return x < k
cdef size_t _filter_cy(long[:] arr, long[:] result, size_t size):
cdef size_t j = 0
for i in range(size):
if cond_cy(arr[i]):
result[j] = arr[i]
j += 1
return j
def filter_cy(arr):
result = np.empty_like(arr)
new_size = _filter_cy(arr, result, arr.size)
return result[:new_size].copy()
def filter_resize_cy(arr):
result = np.empty_like(arr)
new_size = _filter_cy(arr, result, arr.size)
result.resize(new_size)
return result
import numba as nb
@nb.njit
def cond_nb(x, k=K):
return x < k
@nb.njit
def filter_nb(arr, cond_nb):
result = np.empty_like(arr)
j = 0
for i in range(arr.size):
if cond_nb(arr[i]):
result[j] = arr[i]
j += 1
return result[:j].copy()
@nb.njit
def _filter_out_nb(arr, out, cond_nb):
j = 0
for i in range(arr.size):
if cond_nb(arr[i]):
out[j] = arr[i]
j += 1
return j
def filter_resize_xnb(arr, cond_nb):
result = np.empty_like(arr)
j = _filter_out_nb(arr, result, cond_nb)
result.resize(j, refcheck=False) # unsupported in NoPython mode
return result
np.fromiter()
@nb.njit
def filter_gen_nb(arr, cond_nb):
for i in range(arr.size):
if cond_nb(arr[i]):
yield arr[i]
def filter_gen_xnb(arr, cond_nb):
return np.fromiter(filter_gen_nb(arr, cond_nb), dtype=arr.dtype)
4b. Using explicit looping with two passes: one to determine the size of the result and one to actually perform the computation
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
cdef size_t _filtered_size_cy(long[:] arr, size_t size):
cdef size_t j = 0
for i in range(size):
if cond_cy(arr[i]):
j += 1
return j
def filter2_cy(arr):
cdef size_t new_size = _filtered_size_cy(arr, arr.size)
result = np.empty(new_size, dtype=arr.dtype)
new_size = _filter_cy(arr, result, arr.size)
return result
@nb.njit
def filter2_nb(arr, cond_nb):
j = 0
for i in range(arr.size):
if cond_nb(arr[i]):
j += 1
result = np.empty(j, dtype=arr.dtype)
j = 0
for i in range(arr.size):
if cond_nb(arr[i]):
result[j] = arr[i]
j += 1
return result
(The generator-based filter_fromiter()
method is much slower than the others -- by approx. 2 orders of magnitude.
Similar (and perhaps slightly worse) performances can be expected from a list comprehension.
This would be true for any explicit looping with non-accelerated code.)
The timing would depend on both the input array size and the percent of filtered items.
The first graph addresses the timings as a function of the input size (for ~50% filtering factor -- i.e. 50% of the elements appear in the result):
In general, explicit looping with one form of acceleration leads to the fastest execution, with slight variations depending on input size.
Within NumPy, the integer indexing approaches are basically on par with boolean masking.
The benefits of using np.fromiter()
(no pre-allocation) can be reaped by writing a Numba-accelerated generator, which would come out slower than the other approaches (within an order of magnitude), but much faster than pure-Python looping.
The second graph addresses the timings as a function of items passing through the filter (for a fixed input size of ~1 million elements):
The first observation is that all methods are slowest when approaching a ~50% filling and with less, or more filling they are faster, and fastest towards no filling (highest percent of filtered-out values, lowest percent of passing through values as indicated in the x-axis of the graph).
Again, explicit looping with some mean of acceleration leads to the fastest execution.
Within NumPy, the integer indexing and boolean masking approaches are again basically the same.
(Full code available here)
The generator-based filter_fromiter()
method requires only minimal temporary storage, independently of the size of the input.
Memory-wise this is the most efficient method.
This approach can be effectively speed up with a Numba-accelerated generator.
Of similar memory efficiency are the Cython / Numba two-passes methods, because the size of the output is determined during the first pass. The caveat here is that computing the condition has to be fast for these methods to be fast.
On the memory side, the single-pass solutions for both Cython and Numba require a temporary array of the size of the input. Hence, these are not very memory-efficient compared to two-passes or the generator-based one.
Yet they are of similar asymptotic temporary memory footprint compared to masking, but the constant term is typically larger than masking.
The boolean masking solution requires a temporary array of the size of the input but of type bool
, which in NumPy is 1 byte, so this is ~8 times smaller than the default size of a NumPy array on a typical 64-bit system.
The integer indexing solution has the same requirement as the boolean mask slicing in the first step (inside np.nonzero()
call), which gets converted to a series of int
s (typically int64
on a 64-bit system) in the second step (the output of np.nonzero()
).
This second step, therefore, has variable memory requirements, depending on the number of filtered elements.
arr = np.arange(100)
k = 50
print('`arr[arr > k]` is a copy: ', arr[arr > k].base is None)
# `arr[arr > k]` is a copy: True
print('`arr[np.where(arr > k)]` is a copy: ', arr[np.where(arr > k)].base is None)
# `arr[np.where(arr > k)]` is a copy: True
print('`arr[:k]` is a copy: ', arr[:k].base is None)
# `arr[:k]` is a copy: False
(EDITED: various improvements based on @ShadowRanger, @PaulPanzer, @max9111 and @DavidW comments.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With