Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Python convert bytes into float?

I have the following code snippet:

#!/usr/bin/env python3

print(float(b'5'))

Which prints 5.0 with no error (on Linux with utf-8 encoding). I'm very surprised that it doesn't give an error since Python is not supposed to know what encoding is used for the bytes object.

Any insight?

like image 525
static_rtti Avatar asked May 18 '18 10:05

static_rtti


People also ask

Does Python automatically convert int to float?

Integers and floating-point numbers can be mixed in arithmetic. Python 3 automatically converts integers to floats as needed.

How many bytes are in a float Python?

Python float uses 8 bytes (or 64 bits) to represent real numbers. Unlike the integer type, the float type uses a fixed number of bytes.

How do bytes work in Python?

The bytes() function returns a bytes object. It can convert objects into bytes objects, or create empty bytes object of the specified size. The difference between bytes() and bytearray() is that bytes() returns an object that cannot be modified, and bytearray() returns an object that can be modified.

How does Python store data in bytes?

Bytes objects can be constructed the constructor, bytes(), and from literals; use a b prefix with normal string syntax: b'python'. To construct byte arrays, use the bytearray() function.

Can float converted to byte?

Float to Byte Array Conversion As we know, the size of a float in Java is 32 bit which is similar to an int. So we can use floatToIntBits or floatToRawIntBits functions available in the Float class of Java. And then shift the bits to return a byte array.


1 Answers

When passed a bytes object, float() treats the contents of the object as ASCII bytes. That's sufficient here, as the conversion from string to float only accepts ASCII digits and letters, plus . and _ anyway (the only non-ASCII codepoints that would be permitted are whitespace codepoints), and this is analogous to the way int() treats bytes input.

Under the hood, the implementation does this:

  • because the input is not a string, PyNumber_Float() is called on the object (for str objects the code jumps straight to PyFloat_FromString).
  • PyNumber_Float() checks for a __float__ method, but if that's not available, it calls PyFloat_FromString()
  • PyFloat_FromString() accepts not only str objects, but any object implementing the buffer protocol. The String name is a Python 2 holdover, the Python 3 str type is called Unicode in the C implementation.
  • bytes objects implement the buffer protocol, and the PyBytes_AS_STRING macro is used to access the internal C buffer holding the bytes.
  • A combination of two internal functions named _Py_string_to_number_with_underscores() and float_from_string_inner() is then used to parse ASCII bytes into a floating point value.

For actual str strings, the CPython implementation actually converts any non-ASCII string into a sequence of ASCII bytes by only looking at ASCII codepoints in the input value, and converting any non-ASCII whitespace character to ascii 0x20 spaces, to then use the same _Py_string_to_number_with_underscores() / float_from_string_inner() combo.

I see this as a bug in the documentation and have filed issue with the Python project to have it updated.

like image 119
Martijn Pieters Avatar answered Nov 10 '22 01:11

Martijn Pieters