Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Comparison of byte literals

The following question arose because I was trying to use bytes strings as dictionary keys and bytes values that I understood to be equal weren't being treated as equal.

Why doesn't the following python code compare equal - aren't these two equivalent representations of the same binary data (example knowingly chosen to avoid endianess)?

b'0b11111111' == b'0xff'

I know the following evaluates true, demonstrating the equivalence:

int(b'0b11111111', 2) == int(b'0xff', 16)

But why does python force me to know the representation? Is it related to endian-ness? Is there some easy way to force these to compare equivalent other than converting them all to e.g. hex literals? Can anyone suggest a transparent and clear method to move between all representations in a (somewhat) platform independent way (or am I asking too much)?

Edit:

Given the comments below, say I want to actually index a dictionary using 8 bits in the form b'0b11111111', then why does python expand it to ten bytes and how do I prevent that?

This is a smaller piece of a large tree data structure and expanding my indexing by a factor of 80 seems like a huge waste of memory.

like image 256
Matthew Hemke Avatar asked Jul 19 '14 16:07

Matthew Hemke


People also ask

What is the difference between bytes and bytes in Python?

Their literals are written in single or double quotes : 'python', "data". Bytes and bytearray objects contain single bytes – the former is immutable while the latter is a mutable sequence. Bytes objects can be constructed the constructor, bytes(), and from literals; use a b prefix with normal string syntax: b'python'.

What happens when you evaluate a bytes literal in Python?

Each evaluation of a bytes literal produces a new bytes object. The bytes in the new object are the bytes represented by the shortstringitem or longstringitem parts of the literal, in the same order. The proposed syntax provides a cleaner migration path from Python 2.x to Python 3000 for most code involving 8-bit strings.

What are the different types of Python literals?

Python has different types of literals. A string literal can be created by writing a text (a group of Characters ) surrounded by the single (”), double (“”), or triple quotes. By using triple quotes we can write multi-line strings or display in the desired way.

What is ByteArray in Python?

Bytes, Bytearray. Python supports a range of types to store sequences. There are six sequence types: strings, byte sequences (bytes objects), byte arrays (bytearray objects), lists, tuples, and range objects. Strings contain Unicode characters. Their literals are written in single or double quotes : 'python', "data".


2 Answers

Bytes can represent any number of things. Python cannot and will not guess at what your bytes might encode.

For example, int(b'0b11111111', 34) is also a valid interpretation, but that interpretation is not equal to hex FF.

The number of interpretations, in fact, is endless. The bytes could represent a series of ASCII codepoints, or image colors, or musical notes.

Until you explicitly apply an interpretation, the bytes object consists just of the sequence of values in the range 0-255, and the textual representation of those bytes use ASCII if so representable as printable text:

>>> list(bytes(b'0b11111111'))
[48, 98, 49, 49, 49, 49, 49, 49, 49, 49]
>>> list(bytes(b'0xff'))
[48, 120, 102, 102]

Those byte sequences are not equal.

If you want to interpret these sequences explicitly as integer literals, then use ast.literal_eval() to interpret decoded text values; always normalise first before comparison:

>>> import ast
>>> ast.literal_eval(b'0b11111111'.decode('utf8'))
255
>>> ast.literal_eval(b'0xff'.decode('utf8'))
255
like image 91
Martijn Pieters Avatar answered Sep 28 '22 03:09

Martijn Pieters


It seems that what you were trying to do is get a byte string representing the value 0b11111111 (or 255). This is not what b'0b11111111' does – that actually stands for a byte string representing the character (Unicode) string '0b11111111'.

What you want would be written as b'\xff'. You can check that it is actually one byte: len(b'\xff') == 1.

To convert a Python int to a binary representation, you can use the ctypes library. You need to choose one of the C integer types, e.g.:

>>> bytes(ctypes.c_ubyte(255))
b'\xff'

>>> bytes(ctypes.c_ubyte(0xff))
b'\xff'

>>> bytes(ctypes.c_long(255))
b'\xff\x00\x00\x00\x00\x00\x00\x00'

Note: Instead of c_ubyte and c_long, you can use the aliases c_uint8 (i.e. 8-bit unsigned C integer) and c_int64 (64-bit signed C integer), respectively.

To convert back:

>>> ctypes.c_ubyte.from_buffer_copy(b'\xff').value
255

Be careful about overflow:

>>> ctypes.c_ubyte(256)
c_ubyte(0)
like image 43
ondra.cifka Avatar answered Sep 28 '22 03:09

ondra.cifka