Context: I would like to use numpy ndarrays
with float32
instead of float64
.
Edit: Additional context - I'm concerned about how numpy
is executing these calls because they will be happening repeatedly as part of a backpropagation routine in a neural net. I'd like the net to carry out all addition/subtraction/multiplication/division in float32
for validation purposes, as I want to compare results with another group's work. It seems like initialization for methods like randn
will always go from float64
-> float32
with .astype()
casting. Once my ndarray
is of type float32
if i use np.dot
for example will those multiplications happen in float32
? How can I verify?
The documentation is not clear to me - http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html
I figured out I can just add .astype('float32')
to the end of a numpy call, for example, np.random.randn(y, 1).astype('float32')
.
I also see that dtype=np.float32
is an option, for example, np.zeros(5, dtype=np.float32)
. However, trying np.random.randn((y, 1), dtype=np.float32)
returns the following error:
b = np.random.randn((3,1), dtype=np.float32)
TypeError: randn() got an unexpected keyword argument 'dtype'
What is the difference between declaring the type as float32
using dtype
and using .astype()
?
Both b = np.zeros(5, dtype=np.float32)
and b = np.zeros(5).astype('float32')
when evaluated with:
print(type(b))
print(b[0])
print(type(b[0]))
prints:
[ 0. 0. 0. 0. 0.]
<class 'numpy.ndarray'>
0.0
<class 'numpy.float32'>
Practical Data Science using Python We have a method called astype(data_type) to change the data type of a numpy array. If we have a numpy array of type float64, then we can change it to int32 by giving the data type to the astype() method of numpy array. We can check the type of numpy array using the dtype class.
If the function accepts the dtype parameter then use it. If it doesn't accept that parameter you'll have to use the astype . The effect should be the same (in most cases). The function that accepts dtype might be using astype (or equivalent) in its return expression.
The type of a NumPy array is numpy. ndarray ; this is just the type of Python object it is (similar to how type("hello") is str for example). dtype just defines how bytes in memory will be interpreted by a scalar (i.e. a single number) or an array and the way in which the bytes will be treated (e.g. int / float ).
The astype() function creates a copy of the array, and allows you to specify the data type as a parameter. The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the data type directly like float for float and int for integer.
Let's see if I can address some of the confusion I'm seeing in the comments.
Make an array:
In [609]: x=np.arange(5)
In [610]: x
Out[610]: array([0, 1, 2, 3, 4])
In [611]: x.dtype
Out[611]: dtype('int32')
The default for arange
is to make an int32.
astype
is an array method; it can used on any array:
In [612]: x.astype(np.float32)
Out[612]: array([ 0., 1., 2., 3., 4.], dtype=float32)
arange
also takes a dtype
parameter
In [614]: np.arange(5, dtype=np.float32)
Out[614]: array([ 0., 1., 2., 3., 4.], dtype=float32)
whether it created the int array first and converted it, or made the float32 directly isn't any concern to me. This is a basic operation, done in compiled code.
I can also give it a float stop
value, in which case it will give me a float array - the default float type.
In [615]: np.arange(5.0)
Out[615]: array([ 0., 1., 2., 3., 4.])
In [616]: _.dtype
Out[616]: dtype('float64')
zeros
is similar; the default dtype is float64, but with a parameter I can change that. Since its primary task with to allocate memory, and it doesn't have to do any calculation, I'm sure it creates the desired dtype right away, without further conversion. But again, this is compiled code, and I shouldn't have to worry about what it is doing under the covers.
In [618]: np.zeros(5)
Out[618]: array([ 0., 0., 0., 0., 0.])
In [619]: _.dtype
Out[619]: dtype('float64')
In [620]: np.zeros(5,dtype=np.float32)
Out[620]: array([ 0., 0., 0., 0., 0.], dtype=float32)
randn
involves a lot of calculation, and evidently it is compiled to work with the default float type. It does not take a dtype. But since the result is an array, it can be cast with astype
.
In [623]: np.random.randn(3)
Out[623]: array([-0.64520949, 0.21554705, 2.16722514])
In [624]: _.dtype
Out[624]: dtype('float64')
In [625]: __.astype(np.float32)
Out[625]: array([-0.64520949, 0.21554704, 2.16722512], dtype=float32)
Let me stress that astype
is a method of an array. It takes the values of the array and produces a new array with the desire dtype. It does not act retroactively (or in-place) on the array itself, or on the function that created that array.
The effect of astype
is often (always?) the same as a dtype
parameter, but the sequence of actions is different.
In https://stackoverflow.com/a/39625960/901925 I describe a sparse matrix creator that takes a dtype
parameter, and implements it with an astype
method call at the end.
When you do calculations such as dot
or *
, it tries to match the output dtype with inputs. In the case of mixed types it goes with the higher precision alternative.
In [642]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float64)
Out[642]: array([ 0., 1., 4., 9., 16.])
In [643]: _.dtype
Out[643]: dtype('float64')
In [644]: np.arange(5,dtype=np.float32)*np.arange(5,dtype=np.float32)
Out[644]: array([ 0., 1., 4., 9., 16.], dtype=float32)
There are casting rules. One way to look those up is with can_cast
function:
In [649]: np.can_cast(np.float64,np.float32)
Out[649]: False
In [650]: np.can_cast(np.float32,np.float64)
Out[650]: True
It is possible in some calculations that it will cast the 32 to 64, do the calculation, and then cast back to 32. The purpose would be to avoid rounding errors. But I don't know how you find that out from the documentation or tests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With