Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what does .dtype do?

Tags:

python

numpy

I am new to Python, and don't understand what .dtype does.
For example:

>>> aa
array([1, 2, 3, 4, 5, 6, 7, 8])
>>> aa.dtype = "float64"
>>> aa
array([  4.24399158e-314,   8.48798317e-314,   1.27319747e-313,
     1.69759663e-313])

I thought dtype is a property of aa, which should be int, and if I assign aa.dtype = "float64"
thenaa should become array([1.0 ,2.0 ,3.0, 4.0, 5.0, 6.0, 7.0, 8.0]).

Why does it changes its value and size?
What does it mean?

I was actually learning from a piece of code, and shall I paste it here:

def to_1d(array):
 """prepares an array into a 1d real vector"""
    a = array.copy() # copy the array, to avoid changing global
    orig_dtype = a.dtype
    a.dtype = "float64" # this doubles the size of array
    orig_shape = a.shape
    return a.ravel(), (orig_dtype, orig_shape) #flatten and return

I think it shouldn't change the value of the input array but only change its size. Confused of how the function works

like image 291
user1233157 Avatar asked Feb 26 '12 20:02

user1233157


People also ask

Is Dtypes a function?

dtype() function. The dtype() function is used to create a data type object. A numpy array is homogeneous, and contains elements described by a dtype object. A dtype object can be constructed from different combinations of fundamental numeric types.

What does Dtype float64 mean?

By setting dtype to float64 you are just telling the computer to read that memory as float64 instead of actually converting the integer numbers to floating point numbers.

What is the syntax of Dtype object?

Explanation: It creates an ndarray from any object exposing array interface, or from any method that returns an array : numpy. array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0).


2 Answers

By changing the dtype in this way, you are changing the way a fixed block of memory is being interpreted.

Example:

>>> import numpy as np
>>> a=np.array([1,0,0,0,0,0,0,0],dtype='int8')
>>> a
array([1, 0, 0, 0, 0, 0, 0, 0], dtype=int8)
>>> a.dtype='int64'
>>> a
array([1])

Note how the change from int8 to int64 changed an 8 element, 8 bit integer array, into a 1 element, 64 bit array. It is the same 8 byte block however. On my i7 machine with native endianess, the byte pattern is the same as 1 in an int64 format.

Change the position of the 1:

>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int8')
>>> a.dtype='int64'
>>> a
array([16777216])

Another example:

>>> a=np.array([0,0,0,0,0,0,1,0],dtype='int32')
>>> a.dtype='int64'
>>> a
array([0, 0, 0, 1])

Change the position of the 1 in the 32 byte, 32 bit array:

>>> a=np.array([0,0,0,1,0,0,0,0],dtype='int32')
>>> a.dtype='int64'
>>> a
array([         0, 4294967296,          0,          0]) 

It is the same block of bits reinterpreted.

like image 20
the wolf Avatar answered Oct 20 '22 15:10

the wolf


First off, the code you're learning from is flawed. It almost certainly doesn't do what the original author thought it did based on the comments in the code.

What the author probably meant was this:

def to_1d(array):
    """prepares an array into a 1d real vector"""
    return array.astype(np.float64).ravel()

However, if array is always going to be an array of complex numbers, then the original code makes some sense.

The only cases where viewing the array (a.dtype = 'float64' is equivalent to doing a = a.view('float64')) would double its size is if it's a complex array (numpy.complex128) or a 128-bit floating point array. For any other dtype, it doesn't make much sense.

For the specific case of a complex array, the original code would convert something like np.array([0.5+1j, 9.0+1.33j]) into np.array([0.5, 1.0, 9.0, 1.33]).

A cleaner way to write that would be:

def complex_to_iterleaved_real(array):
     """prepares a complex array into an "interleaved" 1d real vector"""
    return array.copy().view('float64').ravel()

(I'm ignoring the part about returning the original dtype and shape, for the moment.)


Background on numpy arrays

To explain what's going on here, you need to understand a bit about what numpy arrays are.

A numpy array consists of a "raw" memory buffer that is interpreted as an array through "views". You can think of all numpy arrays as views.

Views, in the numpy sense, are just a different way of slicing and dicing the same memory buffer without making a copy.

A view has a shape, a data type (dtype), an offset, and strides. Where possible, indexing/reshaping operations on a numpy array will just return a view of the original memory buffer.

This means that things like y = x.T or y = x[::2] don't use any extra memory, and don't make copies of x.

So, if we have an array similar to this:

import numpy as np
x = np.array([1,2,3,4,5,6,7,8,9,10])

We could reshape it by doing either:

x = x.reshape((2, 5))

or

x.shape = (2, 5)

For readability, the first option is better. They're (almost) exactly equivalent, though. Neither one will make a copy that will use up more memory (the first will result in a new python object, but that's beside the point, at the moment.).


Dtypes and views

The same thing applies to the dtype. We can view an array as a different dtype by either setting x.dtype or by calling x.view(...).

So we can do things like this:

import numpy as np
x = np.array([1,2,3], dtype=np.int)

print 'The original array'
print x

print '\n...Viewed as unsigned 8-bit integers (notice the length change!)'
y = x.view(np.uint8)
print y

print '\n...Doing the same thing by setting the dtype'
x.dtype = np.uint8
print x

print '\n...And we can set the dtype again and go back to the original.'
x.dtype = np.int
print x

Which yields:

The original array
[1 2 3]

...Viewed as unsigned 8-bit integers (notice the length change!)
[1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]

...Doing the same thing by setting the dtype
[1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0]

...And we can set the dtype again and go back to the original.
[1 2 3]

Keep in mind, though, that this is giving you low-level control over the way that the memory buffer is interpreted.

For example:

import numpy as np
x = np.arange(10, dtype=np.int)

print 'An integer array:', x
print 'But if we view it as a float:', x.view(np.float)
print "...It's probably not what we expected..."

This yields:

An integer array: [0 1 2 3 4 5 6 7 8 9]
But if we view it as a float: [  0.00000000e+000   4.94065646e-324   
   9.88131292e-324   1.48219694e-323   1.97626258e-323   
   2.47032823e-323   2.96439388e-323   3.45845952e-323
   3.95252517e-323   4.44659081e-323]
...It's probably not what we expected...

So, we're interpreting the underlying bits of the original memory buffer as floats, in this case.

If we wanted to make a new copy with the ints recasted as floats, we'd use x.astype(np.float).


Complex Numbers

Complex numbers are stored (in both C, python, and numpy) as two floats. The first is the real part and the second is the imaginary part.

So, if we do:

import numpy as np
x = np.array([0.5+1j, 1.0+2j, 3.0+0j])

We can see the real (x.real) and imaginary (x.imag) parts. If we convert this to a float, we'll get a warning about discarding the imaginary part, and we'll get an array with just the real part.

print x.real
print x.astype(float)

astype makes a copy and converts the values to the new type.

However, if we view this array as a float, we'll get a sequence of item1.real, item1.imag, item2.real, item2.imag, ....

print x
print x.view(float)

yields:

[ 0.5+1.j  1.0+2.j  3.0+0.j]
[ 0.5  1.   1.   2.   3.   0. ]

Each complex number is essentially two floats, so if we change how numpy interprets the underlying memory buffer, we get an array of twice the length.

Hopefully that helps clear things up a bit...

like image 111
Joe Kington Avatar answered Oct 20 '22 15:10

Joe Kington