These days, I've switched from Matlab to NumPy/SciPy.
Today, I encountered a weird problem when I tried to load data stored in "binary format". Audio data is stored in the 4-byte single-precision floating point number format. I tried the following first.
data = np.fromfile('out.raw', dtype=float) # This is wrong
plt.plot(data)
But it didn't work. After some search, I tried the following, and it worked as expected:
data = np.fromfile('out.raw', dtype=np.float32) # This is okay.
plt.plot(data)
Based on my previous experience with C/C++, I had expected "float" to be a 4-byte single-precision floating point type. But it turns out that the float is 8-byte data, and in the above case, I should have used np.float32.
I have two questions regarding this.
Q1. Why is the float 8-byte rather than 4-byte, which might be confusing to C/C++ programmers?
Q2. Why can't I use dtype=float32. This causes an error to me. I seems like I should use dtype=np.float32?
Thank you!
This is because float
is a native Python datatype which has an underlying C-double. This is from the Python core rather than from numpy or scipy.
The numpy and scipy types are more specific and tend to match your expectations:
bool_ Boolean (True or False) stored as a byte
int_ Default integer type (same as C long; normally either int64 or int32)
intc Identical to C int (normally int32 or int64)
intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to 9223372036854775807)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float_ Shorthand for float64.
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_ Shorthand for complex128.
complex64 Complex number, represented by two 32-bit floats (real and imaginary components)
complex128 Complex number, represented by two 64-bit floats (real and imaginary components)
If your question is about why core Python uses the term float
when the underlying C-type is double
, the answer is that Python tries to be a higher level of abstraction than a low level language like C. The term float
represents the concept of a floating point number rather than a specific C type such as float
or double
which specify size.
In contrast, numpy allows lower level control of exact size and memory layout. This is the key to its optimizations. However those optimizations and ability to control details comes at a cost of moving the code away from the high level abstraction of "what you're trying to do" and into the world of "specifying the details of how it is done".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With