Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between dtype= and .astype() in numpy?

Context: I would like to use numpy ndarrays with float32 instead of float64.

Edit: Additional context - I'm concerned about how numpy is executing these calls because they will be happening repeatedly as part of a backpropagation routine in a neural net. I'd like the net to carry out all addition/subtraction/multiplication/division in float32 for validation purposes, as I want to compare results with another group's work. It seems like initialization for methods like randn will always go from float64 -> float32 with .astype() casting. Once my ndarray is of type float32 if i use np.dot for example will those multiplications happen in float32? How can I verify?

The documentation is not clear to me - http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html

I figured out I can just add .astype('float32') to the end of a numpy call, for example, np.random.randn(y, 1).astype('float32').

I also see that dtype=np.float32 is an option, for example, np.zeros(5, dtype=np.float32). However, trying np.random.randn((y, 1), dtype=np.float32) returns the following error:

    b = np.random.randn((3,1), dtype=np.float32)
TypeError: randn() got an unexpected keyword argument 'dtype'

What is the difference between declaring the type as float32 using dtype and using .astype()?

Both b = np.zeros(5, dtype=np.float32) and b = np.zeros(5).astype('float32') when evaluated with:

print(type(b))
print(b[0])
print(type(b[0]))

prints:

[ 0.  0.  0.  0.  0.]
<class 'numpy.ndarray'>
0.0
<class 'numpy.float32'>
like image 773
phoenixdown Avatar asked Sep 21 '16 17:09

phoenixdown


People also ask

What is Astype numpy?

Practical Data Science using Python We have a method called astype(data_type) to change the data type of a numpy array. If we have a numpy array of type float64, then we can change it to int32 by giving the data type to the astype() method of numpy array. We can check the type of numpy array using the dtype class.

What is the difference between .dtypes and Astype function?

If the function accepts the dtype parameter then use it. If it doesn't accept that parameter you'll have to use the astype . The effect should be the same (in most cases). The function that accepts dtype might be using astype (or equivalent) in its return expression.

What is difference between Dtype and type?

The type of a NumPy array is numpy. ndarray ; this is just the type of Python object it is (similar to how type("hello") is str for example). dtype just defines how bytes in memory will be interpreted by a scalar (i.e. a single number) or an array and the way in which the bytes will be treated (e.g. int / float ).

How do you use Astype numpy?

The astype() function creates a copy of the array, and allows you to specify the data type as a parameter. The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the data type directly like float for float and int for integer.


1 Answers

Let's see if I can address some of the confusion I'm seeing in the comments.

Make an array:

In [609]: x=np.arange(5)
In [610]: x
Out[610]: array([0, 1, 2, 3, 4])
In [611]: x.dtype
Out[611]: dtype('int32')

The default for arange is to make an int32.

astype is an array method; it can used on any array:

In [612]: x.astype(np.float32)
Out[612]: array([ 0.,  1.,  2.,  3.,  4.], dtype=float32)

arange also takes a dtype parameter

In [614]: np.arange(5, dtype=np.float32)
Out[614]: array([ 0.,  1.,  2.,  3.,  4.], dtype=float32)

whether it created the int array first and converted it, or made the float32 directly isn't any concern to me. This is a basic operation, done in compiled code.

I can also give it a float stop value, in which case it will give me a float array - the default float type.

In [615]: np.arange(5.0)
Out[615]: array([ 0.,  1.,  2.,  3.,  4.])
In [616]: _.dtype
Out[616]: dtype('float64')

zeros is similar; the default dtype is float64, but with a parameter I can change that. Since its primary task with to allocate memory, and it doesn't have to do any calculation, I'm sure it creates the desired dtype right away, without further conversion. But again, this is compiled code, and I shouldn't have to worry about what it is doing under the covers.

In [618]: np.zeros(5)
Out[618]: array([ 0.,  0.,  0.,  0.,  0.])
In [619]: _.dtype
Out[619]: dtype('float64')
In [620]: np.zeros(5,dtype=np.float32)
Out[620]: array([ 0.,  0.,  0.,  0.,  0.], dtype=float32)

randn involves a lot of calculation, and evidently it is compiled to work with the default float type. It does not take a dtype. But since the result is an array, it can be cast with astype.

In [623]: np.random.randn(3)
Out[623]: array([-0.64520949,  0.21554705,  2.16722514])
In [624]: _.dtype
Out[624]: dtype('float64')
In [625]: __.astype(np.float32)
Out[625]: array([-0.64520949,  0.21554704,  2.16722512], dtype=float32)

Let me stress that astype is a method of an array. It takes the values of the array and produces a new array with the desire dtype. It does not act retroactively (or in-place) on the array itself, or on the function that created that array.

The effect of astype is often (always?) the same as a dtype parameter, but the sequence of actions is different.

In https://stackoverflow.com/a/39625960/901925 I describe a sparse matrix creator that takes a dtype parameter, and implements it with an astype method call at the end.

When you do calculations such as dot or *, it tries to match the output dtype with inputs. In the case of mixed types it goes with the higher precision alternative.

In [642]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float64)
Out[642]: array([  0.,   1.,   4.,   9.,  16.])
In [643]: _.dtype
Out[643]: dtype('float64')
In [644]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float32)
Out[644]: array([  0.,   1.,   4.,   9.,  16.], dtype=float32)

There are casting rules. One way to look those up is with can_cast function:

In [649]: np.can_cast(np.float64,np.float32)
Out[649]: False
In [650]: np.can_cast(np.float32,np.float64)
Out[650]: True

It is possible in some calculations that it will cast the 32 to 64, do the calculation, and then cast back to 32. The purpose would be to avoid rounding errors. But I don't know how you find that out from the documentation or tests.

like image 148
hpaulj Avatar answered Nov 15 '22 04:11

hpaulj