I am wondering if there is any downside of using b = np.array(a)
rather than b = np.copy(a)
to copy a Numpy array a
into b. When I %timeit
, the former can be upto 100% faster.
In both cases b is a
is False
, and I can manipulate b
leaving a
intact, so I suppose this does what is expected from .copy()
.
Am I missing anything? What is improper about using np.array
to do copy an array?
with python 3.6.5, numpy 1.14.2, while the speed difference closes rapidly for larger sizes:
a = np.arange(1000)
%timeit np.array(a)
501 ns ± 30.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit np.copy(a)
1.1 µs ± 35.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
From documentation of numpy.copy
:
This is equivalent to:
>>> np.array(a, copy=True)
Also, if you look at the source code:
def copy(a, order='K'):
return array(a, order=order, copy=True)
Some timings:
In [1]: import numpy as np
In [2]: a = np.ascontiguousarray(np.random.randint(0, 20000, 1000))
In [3]: %timeit b = np.array(a)
562 ns ± 10.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %timeit b = np.array(a, order='K', copy=True)
1.1 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: %timeit b = np.copy(a)
1.21 µs ± 9.28 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: a = np.ascontiguousarray(np.random.randint(0, 20000, 1000000))
In [7]: %timeit b = np.array(a)
310 µs ± 6.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [8]: %timeit b = np.array(a, order='K', copy=True)
311 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [9]: %timeit b = np.copy(a)
313 µs ± 4.33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [10]: print(np.__version__)
1.13.3
It is unexpected that simply explicitly setting parameters to their default values changes the speed of execution of np.array()
. On the other hand, maybe just processing these explicit arguments adds enough execution time to make a difference for small arrays. Indeed, from the source code for the numpy.array()
, one can see that there are many more checks and more processing being performed when keyword arguments are provided, for example, see goto full_path
. When keyword parameters are not set, the execution skips all the way down to goto finish
. This overhead (of additional processing of keyword arguments) is what you detect in timings for small arrays. For larger arrays this overhead is insignificant in comparison to the actual time of copying the arrays.
"What is improper about using np.array to do copy an array?"
I'd argue it is harder to read. Because it is not obvious that array
makes a copy, for example, the similar asarray
does not make a copy if it doesn't have to. The reader basically has to know the default value of the copy
keyword argument to be sure.
As AGN pointed out, np.array is faster than np.copy because essentially the latter is a wrapper of the former. This means python "loses" some extra time searching for both functions. A similar thing happens with decorators.
This extra time is insignificant for pratical purposes, and you gain better code readability.
You can test it by using a big array (where the array creation takes the main time), and you'll see very little differences in %timeit for both.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With