Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a significant overhead in calling `np.asarray' on a NumPy array?

I am quite new to the Python world, so please excuse my dumb question.

In a number of circumstances, I implement a functions that works on array-like numerical inputs, and it is usually advantageous to make use of NumPy utilities for basic operations on the sequences. To this end, I would write something like this:

import numpy as np

def f(x):
    if not isinstance(x, np.ndarray):
        x = np.asarray(x)
    # and from now on we know that x is a NumPy array, with all standard methods

(Note that I don't want to rely on the caller to always pass NumPy arrays.)

I was wondering what would be the additional overhead if simplified the code by removing the if? I.e., having something like

def f(x):
    x = np.asarray(x)
    # and from now on we know that x is a NumPy array, with all standard methods

Basically, the difference between two cases is that second code is more compact, but will unnecessarily call np.asarray even if x is already a NumPy array.

like image 207
MikeL Avatar asked Jun 28 '19 10:06

MikeL


2 Answers

Short answer: Since you are checking with isinstance(), you may use numpy.asanyarray() which will pass through any ndarray and its subclasses without overhead.

According to the docs for numpy.asarray(), when the input is already an ndarray type, there is no overhead when the input is already an array: no copying happens, they "pass through". Although, it is worth noting that a subclass of ndarray does not pass through.

Since in your original code you are using isinstance(x, numpy.ndarray), you most likely will want numpy.asanyarray() which passes though the subclasses of ndarray also, which would be more efficient for your use case. (Because isinstance() returns true for subclasses as well)

Returns: out : ndarray Array interpretation of a. No copy is performed if the input is already an ndarray with matching dtype and order. If a is a subclass of ndarray, a base class ndarray is returned.

This example from the docs (plus my own comments) explains the differences and why asanyarray() is better for your use case:

>>> issubclass(np.recarray, np.ndarray)
True   # This is to show that recarray is a subclass of ndarray
>>> a = np.array([(1.0, 2), (3.0, 4)], dtype='f4,i4').view(np.recarray)
>>> np.asarray(a) is a
False  # Here a copy happens which is an overhead you do not want,
       # because the input type recarray is only a subclass of ndarray
>>> np.asanyarray(a) is a
True   # Here no copying happens, your subclass of ndarray passes through.
like image 91
bakkal Avatar answered Oct 26 '22 23:10

bakkal


Looking at the code, np.asarray does:

array(a, dtype, copy=False, order=order)

np.asanyarray does

array(a, dtype, copy=False, order=order, subok=True)

defaults for np.array are:

array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)
like image 21
hpaulj Avatar answered Oct 26 '22 23:10

hpaulj