The following question arose because I was trying to use bytes
strings as dictionary keys and bytes values that I understood to be equal weren't being treated as equal.
Why doesn't the following python code compare equal - aren't these two equivalent representations of the same binary data (example knowingly chosen to avoid endianess)?
b'0b11111111' == b'0xff'
I know the following evaluates true, demonstrating the equivalence:
int(b'0b11111111', 2) == int(b'0xff', 16)
But why does python force me to know the representation? Is it related to endian-ness? Is there some easy way to force these to compare equivalent other than converting them all to e.g. hex literals? Can anyone suggest a transparent and clear method to move between all representations in a (somewhat) platform independent way (or am I asking too much)?
Edit:
Given the comments below, say I want to actually index a dictionary using 8 bits in the form b'0b11111111'
, then why does python expand it to ten bytes and how do I prevent that?
This is a smaller piece of a large tree data structure and expanding my indexing by a factor of 80 seems like a huge waste of memory.
Their literals are written in single or double quotes : 'python', "data". Bytes and bytearray objects contain single bytes – the former is immutable while the latter is a mutable sequence. Bytes objects can be constructed the constructor, bytes(), and from literals; use a b prefix with normal string syntax: b'python'.
Each evaluation of a bytes literal produces a new bytes object. The bytes in the new object are the bytes represented by the shortstringitem or longstringitem parts of the literal, in the same order. The proposed syntax provides a cleaner migration path from Python 2.x to Python 3000 for most code involving 8-bit strings.
Python has different types of literals. A string literal can be created by writing a text (a group of Characters ) surrounded by the single (”), double (“”), or triple quotes. By using triple quotes we can write multi-line strings or display in the desired way.
Bytes, Bytearray. Python supports a range of types to store sequences. There are six sequence types: strings, byte sequences (bytes objects), byte arrays (bytearray objects), lists, tuples, and range objects. Strings contain Unicode characters. Their literals are written in single or double quotes : 'python', "data".
Bytes can represent any number of things. Python cannot and will not guess at what your bytes might encode.
For example, int(b'0b11111111', 34)
is also a valid interpretation, but that interpretation is not equal to hex FF.
The number of interpretations, in fact, is endless. The bytes could represent a series of ASCII codepoints, or image colors, or musical notes.
Until you explicitly apply an interpretation, the bytes object consists just of the sequence of values in the range 0-255, and the textual representation of those bytes use ASCII if so representable as printable text:
>>> list(bytes(b'0b11111111'))
[48, 98, 49, 49, 49, 49, 49, 49, 49, 49]
>>> list(bytes(b'0xff'))
[48, 120, 102, 102]
Those byte sequences are not equal.
If you want to interpret these sequences explicitly as integer literals, then use ast.literal_eval()
to interpret decoded text values; always normalise first before comparison:
>>> import ast
>>> ast.literal_eval(b'0b11111111'.decode('utf8'))
255
>>> ast.literal_eval(b'0xff'.decode('utf8'))
255
It seems that what you were trying to do is get a byte string representing the value 0b11111111
(or 255). This is not what b'0b11111111'
does – that actually stands for a byte string representing the character (Unicode) string '0b11111111'
.
What you want would be written as b'\xff'
. You can check that it is actually one byte: len(b'\xff') == 1
.
To convert a Python int
to a binary representation, you can use the ctypes
library. You need to choose one of the C integer types, e.g.:
>>> bytes(ctypes.c_ubyte(255))
b'\xff'
>>> bytes(ctypes.c_ubyte(0xff))
b'\xff'
>>> bytes(ctypes.c_long(255))
b'\xff\x00\x00\x00\x00\x00\x00\x00'
Note: Instead of c_ubyte
and c_long
, you can use the aliases c_uint8
(i.e. 8-bit unsigned C integer) and c_int64
(64-bit signed C integer), respectively.
To convert back:
>>> ctypes.c_ubyte.from_buffer_copy(b'\xff').value
255
Be careful about overflow:
>>> ctypes.c_ubyte(256)
c_ubyte(0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With