Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the python "dtype=float" 8-byte rather than 4-byte?

These days, I've switched from Matlab to NumPy/SciPy.

Today, I encountered a weird problem when I tried to load data stored in "binary format". Audio data is stored in the 4-byte single-precision floating point number format. I tried the following first.

data = np.fromfile('out.raw', dtype=float) # This is wrong
plt.plot(data)

But it didn't work. After some search, I tried the following, and it worked as expected:

data = np.fromfile('out.raw', dtype=np.float32) # This is okay.
plt.plot(data)

Based on my previous experience with C/C++, I had expected "float" to be a 4-byte single-precision floating point type. But it turns out that the float is 8-byte data, and in the above case, I should have used np.float32.

I have two questions regarding this.

Q1. Why is the float 8-byte rather than 4-byte, which might be confusing to C/C++ programmers?

Q2. Why can't I use dtype=float32. This causes an error to me. I seems like I should use dtype=np.float32?

Thank you!

like image 465
chanwcom Avatar asked Dec 18 '22 08:12

chanwcom


1 Answers

This is because float is a native Python datatype which has an underlying C-double. This is from the Python core rather than from numpy or scipy.

The numpy and scipy types are more specific and tend to match your expectations:

bool_   Boolean (True or False) stored as a byte
int_    Default integer type (same as C long; normally either int64 or int32)
intc    Identical to C int (normally int32 or int64)
intp    Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8    Byte (-128 to 127)
int16   Integer (-32768 to 32767)
int32   Integer (-2147483648 to 2147483647)
int64   Integer (-9223372036854775808 to 9223372036854775807)
uint8   Unsigned integer (0 to 255)
uint16  Unsigned integer (0 to 65535)
uint32  Unsigned integer (0 to 4294967295)
uint64  Unsigned integer (0 to 18446744073709551615)
float_  Shorthand for float64.
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_    Shorthand for complex128.
complex64   Complex number, represented by two 32-bit floats (real and imaginary components)
complex128  Complex number, represented by two 64-bit floats (real and imaginary components)

If your question is about why core Python uses the term float when the underlying C-type is double, the answer is that Python tries to be a higher level of abstraction than a low level language like C. The term float represents the concept of a floating point number rather than a specific C type such as float or double which specify size.

In contrast, numpy allows lower level control of exact size and memory layout. This is the key to its optimizations. However those optimizations and ability to control details comes at a cost of moving the code away from the high level abstraction of "what you're trying to do" and into the world of "specifying the details of how it is done".

like image 144
Raymond Hettinger Avatar answered Dec 21 '22 11:12

Raymond Hettinger