Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

confused about the `copy` attribution of `numpy.astype`

I am confused about the copy attribution of numpy.astype. I check out the material in link,it said:

By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.

it means that will change the original value of a ndarray object? Like:

x = np.array([1, 2, 2.5])
x.astype(int, copy=False)

but it seems that x still is the original value array([ 1. , 2. , 2.5]) . can anyone explain it? thank you very much~~

like image 544
JustinGong Avatar asked Nov 22 '17 03:11

JustinGong


3 Answers

What they mean is, if the original array exactly meets the specifiations you passed, i.e. has the correct dtype, majorness and is either not a subclass or you set the subok flag, then a copy will be avoided. The input array is never modified. In your example the dtypes don't match, so a new array is made regardless.

If you want the data not to be copied use view instead. This will if at all possible reinterpret the data buffer according to your specs.

x = np.array([1, 2, 2.5])
y = x.view(int)
y
#  array([4607182418800017408, 4611686018427387904, 4612811918334230528])
# y and x share the same data buffer:
y[...] = 0
x
# array([ 0.,  0.,  0.])
like image 182
Paul Panzer Avatar answered Sep 23 '22 17:09

Paul Panzer


By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.

Notice that the documentation you quoted doesn't mention x being modified at all - in fact, either a brand new array of the desired type is returned, or x is returned unmodified (if possible).

In your case, I believe x doesn't meet the dtype requirement. The documentation doesn't actually describe that requirement at all (so I can understand your confusion), but basically what it means is that the requested dtype (int in this case) must be able to fully represent all values of the original dtype (float in this case). Since you can't cram a float into an int without losing some information, you can't simply pretend that x is an int array.

As such, astype returns a new copy of x, with each value converted to int. It leaves x unmodified, so to get the converted array you need to check the value returned from astype:

x = np.array([1, 2, 2.5])
y = x.astype(int, copy=False)

print x # prints array([ 1. ,  2. ,  2.5]), since x hasn't been modified
print y # prints array([ 1. ,  2. ,  2]), since y is an integer-valued copy of x
like image 40
Mac Avatar answered Sep 24 '22 17:09

Mac


Here's a case where the copy=False works, returning the original array:

In [238]: x
Out[238]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [239]: y = x.astype(int,copy=False)
In [240]: id(x)
Out[240]: 2884971680
In [241]: id(y)
Out[241]: 2884971680     # same id

In [242]: z = x.astype(int)
In [243]: id(z)
Out[243]: 2812517656     # different id

In a sense this is a trivial case; but I wouldn't be surprised if every other case is just as trivial

In [244]: w = x.astype(int,order='F',copy=False)
In [245]: id(w)
Out[245]: 2884971680   # 1d array is both order C and F

In other words it returns the original array if required dtype and order don't require any changes. That is if the original already meets the specs.

This isn't the same as a view. A view is a new array (new id) but shared data buffer. Rather it is more like the simpler Python assignment, y = x.

I may change my mind if someone can some up with a copy=False case that involves a change in dtype.


The same call, but with a different array will create a copy

In [249]: x1=np.arange(10.)      # float
In [250]: y1=x1.astype(int, copy=False)
In [251]: id(x1)
Out[251]: 2812517696
In [253]: id(y1)
Out[253]: 2812420768        # different id
In [254]: y1=x1.astype(float, copy=False)
In [255]: id(y1)
Out[255]: 2812517696

So you could use copy=False if you want, say a int dtype array, but without any loss in efficiency if the array is already int.


Efficient way to cast scalars to numpy arrays

np.array with copy=False behaves in much the same way - returning the same array (id) if no transformation is required.

like image 32
hpaulj Avatar answered Sep 22 '22 17:09

hpaulj