I am quite new to the Python world, so please excuse my dumb question.
In a number of circumstances, I implement a functions that works on array-like numerical inputs, and it is usually advantageous to make use of NumPy utilities for basic operations on the sequences. To this end, I would write something like this:
import numpy as np
def f(x):
if not isinstance(x, np.ndarray):
x = np.asarray(x)
# and from now on we know that x is a NumPy array, with all standard methods
(Note that I don't want to rely on the caller to always pass NumPy arrays.)
I was wondering what would be the additional overhead if simplified the code by removing the if
? I.e., having something like
def f(x):
x = np.asarray(x)
# and from now on we know that x is a NumPy array, with all standard methods
Basically, the difference between two cases is that second code is more compact, but will unnecessarily call np.asarray
even if x
is already a NumPy array.
Short answer: Since you are checking with isinstance()
, you may use numpy.asanyarray()
which will pass through any ndarray
and its subclasses without overhead.
According to the docs for numpy.asarray(), when the input is already an ndarray
type, there is no overhead when the input is already an array: no copying happens, they "pass through". Although, it is worth noting that a subclass of ndarray
does not pass through.
Since in your original code you are using isinstance(x, numpy.ndarray)
, you most likely will want numpy.asanyarray()
which passes though the subclasses of ndarray
also, which would be more efficient for your use case. (Because isinstance()
returns true for subclasses as well)
Returns: out : ndarray Array interpretation of a. No copy is performed if the input is already an ndarray with matching dtype and order. If a is a subclass of ndarray, a base class ndarray is returned.
This example from the docs (plus my own comments) explains the differences and why asanyarray()
is better for your use case:
>>> issubclass(np.recarray, np.ndarray)
True # This is to show that recarray is a subclass of ndarray
>>> a = np.array([(1.0, 2), (3.0, 4)], dtype='f4,i4').view(np.recarray)
>>> np.asarray(a) is a
False # Here a copy happens which is an overhead you do not want,
# because the input type recarray is only a subclass of ndarray
>>> np.asanyarray(a) is a
True # Here no copying happens, your subclass of ndarray passes through.
Looking at the code, np.asarray
does:
array(a, dtype, copy=False, order=order)
np.asanyarray
does
array(a, dtype, copy=False, order=order, subok=True)
defaults for np.array
are:
array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With