Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed of copying numpy array

I am wondering if there is any downside of using b = np.array(a) rather than b = np.copy(a) to copy a Numpy array a into b. When I %timeit, the former can be upto 100% faster.

In both cases b is a is False, and I can manipulate b leaving a intact, so I suppose this does what is expected from .copy().

Am I missing anything? What is improper about using np.array to do copy an array?

with python 3.6.5, numpy 1.14.2, while the speed difference closes rapidly for larger sizes:

a = np.arange(1000)

%timeit np.array(a)
501 ns ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit np.copy(a)  
1.1 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
like image 597
sencer Avatar asked Jun 12 '18 20:06

sencer


3 Answers

From documentation of numpy.copy:

This is equivalent to:

>>> np.array(a, copy=True)

Also, if you look at the source code:

def copy(a, order='K'):
    return array(a, order=order, copy=True)

Some timings:

In [1]: import numpy as np

In [2]: a = np.ascontiguousarray(np.random.randint(0, 20000, 1000))

In [3]: %timeit b = np.array(a)
562 ns ± 10.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit b = np.array(a, order='K', copy=True)
1.1 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit b = np.copy(a)
1.21 µs ± 9.28 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: a = np.ascontiguousarray(np.random.randint(0, 20000, 1000000))

In [7]: %timeit b = np.array(a)
310 µs ± 6.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [8]: %timeit b = np.array(a, order='K', copy=True)
311 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [9]: %timeit b = np.copy(a)
313 µs ± 4.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: print(np.__version__)
1.13.3

It is unexpected that simply explicitly setting parameters to their default values changes the speed of execution of np.array(). On the other hand, maybe just processing these explicit arguments adds enough execution time to make a difference for small arrays. Indeed, from the source code for the numpy.array(), one can see that there are many more checks and more processing being performed when keyword arguments are provided, for example, see goto full_path. When keyword parameters are not set, the execution skips all the way down to goto finish. This overhead (of additional processing of keyword arguments) is what you detect in timings for small arrays. For larger arrays this overhead is insignificant in comparison to the actual time of copying the arrays.

like image 129
AGN Gazer Avatar answered Oct 11 '22 20:10

AGN Gazer


"What is improper about using np.array to do copy an array?"

I'd argue it is harder to read. Because it is not obvious that array makes a copy, for example, the similar asarray does not make a copy if it doesn't have to. The reader basically has to know the default value of the copy keyword argument to be sure.

like image 22
Paul Panzer Avatar answered Oct 11 '22 20:10

Paul Panzer


As AGN pointed out, np.array is faster than np.copy because essentially the latter is a wrapper of the former. This means python "loses" some extra time searching for both functions. A similar thing happens with decorators.

This extra time is insignificant for pratical purposes, and you gain better code readability.

You can test it by using a big array (where the array creation takes the main time), and you'll see very little differences in %timeit for both.

like image 28
Tarifazo Avatar answered Oct 11 '22 20:10

Tarifazo