I have the following code snippet:
#!/usr/bin/env python3
print(float(b'5'))
Which prints 5.0
with no error (on Linux with utf-8 encoding). I'm very surprised that it doesn't give an error since Python is not supposed to know what encoding is used for the bytes object.
Any insight?
Integers and floating-point numbers can be mixed in arithmetic. Python 3 automatically converts integers to floats as needed.
Python float uses 8 bytes (or 64 bits) to represent real numbers. Unlike the integer type, the float type uses a fixed number of bytes.
The bytes() function returns a bytes object. It can convert objects into bytes objects, or create empty bytes object of the specified size. The difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified, and bytearray() returns an object that can be modified.
Bytes objects can be constructed the constructor, bytes(), and from literals; use a b prefix with normal string syntax: b'python'. To construct byte arrays, use the bytearray() function.
Float to Byte Array Conversion As we know, the size of a float in Java is 32 bit which is similar to an int. So we can use floatToIntBits or floatToRawIntBits functions available in the Float class of Java. And then shift the bits to return a byte array.
When passed a bytes
object, float()
treats the contents of the object as ASCII bytes. That's sufficient here, as the conversion from string to float only accepts ASCII digits and letters, plus .
and _
anyway (the only non-ASCII codepoints that would be permitted are whitespace codepoints), and this is analogous to the way int()
treats bytes
input.
Under the hood, the implementation does this:
PyNumber_Float()
is called on the object (for str
objects the code jumps straight to PyFloat_FromString
).PyNumber_Float()
checks for a __float__
method, but if that's not available, it calls PyFloat_FromString()
PyFloat_FromString()
accepts not only str
objects, but any object implementing the buffer protocol. The String
name is a Python 2 holdover, the Python 3 str
type is called Unicode
in the C implementation.bytes
objects implement the buffer protocol, and the PyBytes_AS_STRING
macro is used to access the internal C buffer holding the bytes._Py_string_to_number_with_underscores()
and float_from_string_inner()
is then used to parse ASCII bytes into a floating point value.For actual str
strings, the CPython implementation actually converts any non-ASCII string into a sequence of ASCII bytes by only looking at ASCII codepoints in the input value, and converting any non-ASCII whitespace character to ascii 0x20 spaces, to then use the same _Py_string_to_number_with_underscores()
/ float_from_string_inner()
combo.
I see this as a bug in the documentation and have filed issue with the Python project to have it updated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With