I am new to Python, and don't understand what <code>.dtype</code> does. For example: <pre class="prettyprint"><code>>>> aa array([1, 2, 3, 4, 5, 6, 7, 8]) >>> aa.dtype = "float64" >>> aa array([ 4.24399158e-314, 8.48798317e-314, 1.27319747e-313, 1.69759663e-313]) </code></pre> I thought dtype is a property of aa, which should be int, and if I assign <code>aa.dtype = "float64"</code> then<code>aa</code> should become <code>array([1.0 ,2.0 ,3.0, 4.0, 5.0, 6.0, 7.0, 8.0])</code>. Why does it changes its value and size? What does it mean? I was actually learning from a piece of code, and shall I paste it here: <pre class="prettyprint"><code>def to_1d(array): """prepares an array into a 1d real vector""" a = array.copy() # copy the array, to avoid changing global orig_dtype = a.dtype a.dtype = "float64" # this doubles the size of array orig_shape = a.shape return a.ravel(), (orig_dtype, orig_shape) #flatten and return </code></pre> I think it shouldn't change the value of the input array but only change its size. Confused of how the function works

By changing the dtype in this way, you are changing the way a fixed block of memory is being interpreted. Example: <pre class="prettyprint"><code>>>> import numpy as np >>> a=np.array([1,0,0,0,0,0,0,0],dtype='int8') >>> a array([1, 0, 0, 0, 0, 0, 0, 0], dtype=int8) >>> a.dtype='int64' >>> a array([1]) </code></pre> Note how the change from <code>int8</code> to <code>int64</code> changed an 8 element, 8 bit integer array, into a 1 element, 64 bit array. It is the same 8 byte block however. On my i7 machine with native endianess, the byte pattern is the same as <code>1</code> in an int64 format. Change the position of the 1: <pre class="prettyprint"><code>>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int8') >>> a.dtype='int64' >>> a array([16777216]) </code></pre> Another example: <pre class="prettyprint"><code>>>> a=np.array([0,0,0,0,0,0,1,0],dtype='int32') >>> a.dtype='int64' >>> a array([0, 0, 0, 1]) </code></pre> Change the position of the <code>1</code> in the 32 byte, 32 bit array: <pre class="prettyprint"><code>>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int32') >>> a.dtype='int64' >>> a array([ 0, 4294967296, 0, 0]) </code></pre> It is the same block of bits reinterpreted.

First off, the code you're learning from is flawed. It almost certainly doesn't do what the original author thought it did based on the comments in the code. What the author probably meant was this: <pre class="prettyprint"><code>def to_1d(array): """prepares an array into a 1d real vector""" return array.astype(np.float64).ravel() </code></pre> However, if <code>array</code> is always going to be an array of complex numbers, then the original code makes some sense. The only cases where viewing the array (<code>a.dtype = 'float64'</code> is equivalent to doing <code>a = a.view('float64')</code>) would double its size is if it's a complex array (<code>numpy.complex128</code>) or a 128-bit floating point array. For any other dtype, it doesn't make much sense. For the specific case of a complex array, the original code would convert something like <code>np.array([0.5+1j, 9.0+1.33j])</code> into <code>np.array([0.5, 1.0, 9.0, 1.33])</code>. A cleaner way to write that would be: <pre class="prettyprint"><code>def complex_to_iterleaved_real(array): """prepares a complex array into an "interleaved" 1d real vector""" return array.copy().view('float64').ravel() </code></pre> (I'm ignoring the part about returning the original dtype and shape, for the moment.) <hr> <h3>Background on numpy arrays</h3> To explain what's going on here, you need to understand a bit about what numpy arrays are. A numpy array consists of a "raw" memory buffer that is interpreted as an array through "views". You can think of all numpy arrays as views. Views, in the numpy sense, are just a different way of slicing and dicing the same memory buffer without making a copy. A view has a shape, a data type (dtype), an offset, and strides. Where possible, indexing/reshaping operations on a numpy array will just return a view of the original memory buffer. This means that things like <code>y = x.T</code> or <code>y = x[::2]</code> don't use any extra memory, and don't make copies of <code>x</code>. So, if we have an array similar to this: <pre class="prettyprint"><code>import numpy as np x = np.array([1,2,3,4,5,6,7,8,9,10]) </code></pre> We could reshape it by doing either: <pre class="prettyprint"><code>x = x.reshape((2, 5)) </code></pre> or <pre class="prettyprint"><code>x.shape = (2, 5) </code></pre> For readability, the first option is better. They're (almost) exactly equivalent, though. Neither one will make a copy that will use up more memory (the first will result in a new python object, but that's beside the point, at the moment.). <hr> <h3>Dtypes and views</h3> The same thing applies to the dtype. We can view an array as a different dtype by either setting <code>x.dtype</code> or by calling <code>x.view(...)</code>. So we can do things like this: <pre class="prettyprint"><code>import numpy as np x = np.array([1,2,3], dtype=np.int) print 'The original array' print x print '\n...Viewed as unsigned 8-bit integers (notice the length change!)' y = x.view(np.uint8) print y print '\n...Doing the same thing by setting the dtype' x.dtype = np.uint8 print x print '\n...And we can set the dtype again and go back to the original.' x.dtype = np.int print x </code></pre> Which yields: <pre class="prettyprint"><code>The original array [1 2 3] ...Viewed as unsigned 8-bit integers (notice the length change!) [1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0] ...Doing the same thing by setting the dtype [1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0] ...And we can set the dtype again and go back to the original. [1 2 3] </code></pre> Keep in mind, though, that this is giving you low-level control over the way that the memory buffer is interpreted. For example: <pre class="prettyprint"><code>import numpy as np x = np.arange(10, dtype=np.int) print 'An integer array:', x print 'But if we view it as a float:', x.view(np.float) print "...It's probably not what we expected..." </code></pre> This yields: <pre class="prettyprint"><code>An integer array: [0 1 2 3 4 5 6 7 8 9] But if we view it as a float: [ 0.00000000e+000 4.94065646e-324 9.88131292e-324 1.48219694e-323 1.97626258e-323 2.47032823e-323 2.96439388e-323 3.45845952e-323 3.95252517e-323 4.44659081e-323] ...It's probably not what we expected... </code></pre> So, we're interpreting the underlying bits of the original memory buffer as floats, in this case. If we wanted to make a new copy with the ints recasted as floats, we'd use x.astype(np.float). <hr> <h3>Complex Numbers</h3> Complex numbers are stored (in both C, python, and numpy) as two floats. The first is the real part and the second is the imaginary part. So, if we do: <pre class="prettyprint"><code>import numpy as np x = np.array([0.5+1j, 1.0+2j, 3.0+0j]) </code></pre> We can see the real (<code>x.real</code>) and imaginary (<code>x.imag</code>) parts. If we convert this to a float, we'll get a warning about discarding the imaginary part, and we'll get an array with just the real part. <pre class="prettyprint"><code>print x.real print x.astype(float) </code></pre> <code>astype</code> makes a copy and converts the values to the new type. However, if we view this array as a float, we'll get a sequence of <code>item1.real, item1.imag, item2.real, item2.imag, ...</code>. <pre class="prettyprint"><code>print x print x.view(float) </code></pre> yields: <pre class="prettyprint"><code>[ 0.5+1.j 1.0+2.j 3.0+0.j] [ 0.5 1. 1. 2. 3. 0. ] </code></pre> Each complex number is essentially two floats, so if we change how numpy interprets the underlying memory buffer, we get an array of twice the length. Hopefully that helps clear things up a bit...

what does .dtype do?

Tags:

python

numpy

I am new to Python, and don't understand what .dtype does.
For example:

>>> aa
array([1, 2, 3, 4, 5, 6, 7, 8])
>>> aa.dtype = "float64"
>>> aa
array([  4.24399158e-314,   8.48798317e-314,   1.27319747e-313,
     1.69759663e-313])

I thought dtype is a property of aa, which should be int, and if I assign aa.dtype = "float64"
thenaa should become array([1.0 ,2.0 ,3.0, 4.0, 5.0, 6.0, 7.0, 8.0]).

Why does it changes its value and size?
What does it mean?

I was actually learning from a piece of code, and shall I paste it here:

def to_1d(array):
 """prepares an array into a 1d real vector"""
    a = array.copy() # copy the array, to avoid changing global
    orig_dtype = a.dtype
    a.dtype = "float64" # this doubles the size of array
    orig_shape = a.shape
    return a.ravel(), (orig_dtype, orig_shape) #flatten and return

I think it shouldn't change the value of the input array but only change its size. Confused of how the function works

291

asked Feb 26 '12 20:02

user1233157

2 Answers

By changing the dtype in this way, you are changing the way a fixed block of memory is being interpreted.

Example:

>>> import numpy as np
>>> a=np.array([1,0,0,0,0,0,0,0],dtype='int8')
>>> a
array([1, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
>>> a.dtype='int64'
>>> a
array([1])

Note how the change from int8 to int64 changed an 8 element, 8 bit integer array, into a 1 element, 64 bit array. It is the same 8 byte block however. On my i7 machine with native endianess, the byte pattern is the same as 1 in an int64 format.

Change the position of the 1:

>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int8')
>>> a.dtype='int64'
>>> a
array([16777216])

Another example:

>>> a=np.array([0,0,0,0,0,0,1,0],dtype='int32')
>>> a.dtype='int64'
>>> a
array([0, 0, 0, 1])

Change the position of the 1 in the 32 byte, 32 bit array:

>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int32')
>>> a.dtype='int64'
>>> a
array([         0, 4294967296,          0,          0])

It is the same block of bits reinterpreted.

answered Oct 20 '22 15:10

the wolf

First off, the code you're learning from is flawed. It almost certainly doesn't do what the original author thought it did based on the comments in the code.

What the author probably meant was this:

def to_1d(array):
    """prepares an array into a 1d real vector"""
    return array.astype(np.float64).ravel()

However, if array is always going to be an array of complex numbers, then the original code makes some sense.

The only cases where viewing the array (a.dtype = 'float64' is equivalent to doing a = a.view('float64')) would double its size is if it's a complex array (numpy.complex128) or a 128-bit floating point array. For any other dtype, it doesn't make much sense.

For the specific case of a complex array, the original code would convert something like np.array([0.5+1j, 9.0+1.33j]) into np.array([0.5, 1.0, 9.0, 1.33]).

A cleaner way to write that would be:

def complex_to_iterleaved_real(array):
     """prepares a complex array into an "interleaved" 1d real vector"""
    return array.copy().view('float64').ravel()

(I'm ignoring the part about returning the original dtype and shape, for the moment.)

Background on numpy arrays

To explain what's going on here, you need to understand a bit about what numpy arrays are.

A numpy array consists of a "raw" memory buffer that is interpreted as an array through "views". You can think of all numpy arrays as views.

Views, in the numpy sense, are just a different way of slicing and dicing the same memory buffer without making a copy.

A view has a shape, a data type (dtype), an offset, and strides. Where possible, indexing/reshaping operations on a numpy array will just return a view of the original memory buffer.

This means that things like y = x.T or y = x[::2] don't use any extra memory, and don't make copies of x.

So, if we have an array similar to this:

import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])

We could reshape it by doing either:

x = x.reshape((2, 5))

x.shape = (2, 5)

For readability, the first option is better. They're (almost) exactly equivalent, though. Neither one will make a copy that will use up more memory (the first will result in a new python object, but that's beside the point, at the moment.).

Dtypes and views

The same thing applies to the dtype. We can view an array as a different dtype by either setting x.dtype or by calling x.view(...).

So we can do things like this:

import numpy as np
x = np.array([1,2,3], dtype=np.int)

print 'The original array'
print x

print '\n...Viewed as unsigned 8-bit integers (notice the length change!)'
y = x.view(np.uint8)
print y

print '\n...Doing the same thing by setting the dtype'
x.dtype = np.uint8
print x

print '\n...And we can set the dtype again and go back to the original.'
x.dtype = np.int
print x

Which yields:

The original array
[1 2 3]

...Viewed as unsigned 8-bit integers (notice the length change!)
[1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]

...Doing the same thing by setting the dtype
[1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]

...And we can set the dtype again and go back to the original.
[1 2 3]

Keep in mind, though, that this is giving you low-level control over the way that the memory buffer is interpreted.

For example:

import numpy as np
x = np.arange(10, dtype=np.int)

print 'An integer array:', x
print 'But if we view it as a float:', x.view(np.float)
print "...It's probably not what we expected..."

This yields:

An integer array: [0 1 2 3 4 5 6 7 8 9]
But if we view it as a float: [  0.00000000e+000   4.94065646e-324   
   9.88131292e-324   1.48219694e-323   1.97626258e-323   
   2.47032823e-323   2.96439388e-323   3.45845952e-323
   3.95252517e-323   4.44659081e-323]
...It's probably not what we expected...

So, we're interpreting the underlying bits of the original memory buffer as floats, in this case.

If we wanted to make a new copy with the ints recasted as floats, we'd use x.astype(np.float).

Complex Numbers

Complex numbers are stored (in both C, python, and numpy) as two floats. The first is the real part and the second is the imaginary part.

So, if we do:

import numpy as np
x = np.array([0.5+1j, 1.0+2j, 3.0+0j])

We can see the real (x.real) and imaginary (x.imag) parts. If we convert this to a float, we'll get a warning about discarding the imaginary part, and we'll get an array with just the real part.

print x.real
print x.astype(float)

astype makes a copy and converts the values to the new type.

However, if we view this array as a float, we'll get a sequence of item1.real, item1.imag, item2.real, item2.imag, ....

print x
print x.view(float)

yields:

[ 0.5+1.j  1.0+2.j  3.0+0.j]
[ 0.5  1.   1.   2.   3.   0. ]

Each complex number is essentially two floats, so if we change how numpy interprets the underlying memory buffer, we get an array of twice the length.

Hopefully that helps clear things up a bit...

111

answered Oct 20 '22 15:10

Joe Kington

Related questions
                            
                                Flask Restful add resource parameters
                            
                                Dictionary column in pandas dataframe
                            
                                Extract dictionary value from column in data frame
                            
                                Displaying pair plot in Pandas data frame
                            
                                Controlling bars width in matplotlib with per-month data
                            
                                How to add Matplotlib Colorbar Ticks
                            
                                have a url that accepts all characters
                            
                                Paramiko Fails to download large files >1GB
                            
                                Bland-Altman plot in Python
                            
                                Get the error code from tweepy exception instance
                            
                                numpy: unique list of colors in the image
                            
                                use matplotlib color map for color cycle
                            
                                AttributeError: 'module' object has no attribute 'reader' [duplicate]
                            
                                <bound method Response.json of <Response [200]>>
                            
                                Pandas groupby quantile values
                            
                                ValueError: Duplicate plugins for name projector
                            
                                python empty argument
                            
                                Django unable to find MySQLdb python module
                            
                                Find and replace within a text file using Python
                            
                                Unable to reverse lists in Python, getting "Nonetype" as list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With