I am confused about the copy
attribution of numpy.astype
.
I check out the material in link,it said:
By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.
it means that will change the original value of a ndarray object? Like:
x = np.array([1, 2, 2.5])
x.astype(int, copy=False)
but it seems that x
still is the original value array([ 1. , 2. , 2.5])
.
can anyone explain it?
thank you very much~~
What they mean is, if the original array exactly meets the specifiations you passed, i.e. has the correct dtype, majorness and is either not a subclass or you set the subok flag, then a copy will be avoided. The input array is never modified. In your example the dtypes don't match, so a new array is made regardless.
If you want the data not to be copied use view instead. This will if at all possible reinterpret the data buffer according to your specs.
x = np.array([1, 2, 2.5])
y = x.view(int)
y
# array([4607182418800017408, 4611686018427387904, 4612811918334230528])
# y and x share the same data buffer:
y[...] = 0
x
# array([ 0., 0., 0.])
By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.
Notice that the documentation you quoted doesn't mention x
being modified at all - in fact, either a brand new array of the desired type is returned, or x
is returned unmodified (if possible).
In your case, I believe x
doesn't meet the dtype
requirement. The documentation doesn't actually describe that requirement at all (so I can understand your confusion), but basically what it means is that the requested dtype
(int
in this case) must be able to fully represent all values of the original dtype
(float
in this case). Since you can't cram a float
into an int
without losing some information, you can't simply pretend that x
is an int
array.
As such, astype
returns a new copy of x
, with each value converted to int
. It leaves x
unmodified, so to get the converted array you need to check the value returned from astype
:
x = np.array([1, 2, 2.5])
y = x.astype(int, copy=False)
print x # prints array([ 1. , 2. , 2.5]), since x hasn't been modified
print y # prints array([ 1. , 2. , 2]), since y is an integer-valued copy of x
Here's a case where the copy=False
works, returning the original array:
In [238]: x
Out[238]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [239]: y = x.astype(int,copy=False)
In [240]: id(x)
Out[240]: 2884971680
In [241]: id(y)
Out[241]: 2884971680 # same id
In [242]: z = x.astype(int)
In [243]: id(z)
Out[243]: 2812517656 # different id
In a sense this is a trivial case; but I wouldn't be surprised if every other case is just as trivial
In [244]: w = x.astype(int,order='F',copy=False)
In [245]: id(w)
Out[245]: 2884971680 # 1d array is both order C and F
In other words it returns the original array if required dtype
and order
don't require any changes. That is if the original already meets the specs.
This isn't the same as a view
. A view is a new array (new id) but shared data buffer. Rather it is more like the simpler Python assignment, y = x
.
I may change my mind if someone can some up with a copy=False
case that involves a change in dtype
.
The same call, but with a different array will create a copy
In [249]: x1=np.arange(10.) # float
In [250]: y1=x1.astype(int, copy=False)
In [251]: id(x1)
Out[251]: 2812517696
In [253]: id(y1)
Out[253]: 2812420768 # different id
In [254]: y1=x1.astype(float, copy=False)
In [255]: id(y1)
Out[255]: 2812517696
So you could use copy=False
if you want, say a int
dtype array, but without any loss in efficiency if the array is already int
.
Efficient way to cast scalars to numpy arrays
np.array
with copy=False
behaves in much the same way - returning the same array (id) if no transformation is required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With