Python Comparison of byte literals

Tags:

The following question arose because I was trying to use bytes strings as dictionary keys and bytes values that I understood to be equal weren't being treated as equal.

Why doesn't the following python code compare equal - aren't these two equivalent representations of the same binary data (example knowingly chosen to avoid endianess)?

b'0b11111111' == b'0xff'

I know the following evaluates true, demonstrating the equivalence:

int(b'0b11111111', 2) == int(b'0xff', 16)

But why does python force me to know the representation? Is it related to endian-ness? Is there some easy way to force these to compare equivalent other than converting them all to e.g. hex literals? Can anyone suggest a transparent and clear method to move between all representations in a (somewhat) platform independent way (or am I asking too much)?

Edit:

Given the comments below, say I want to actually index a dictionary using 8 bits in the form b'0b11111111', then why does python expand it to ten bytes and how do I prevent that?

This is a smaller piece of a large tree data structure and expanding my indexing by a factor of 80 seems like a huge waste of memory.

256

asked Jul 19 '14 16:07

Matthew Hemke

2 Answers

Bytes can represent any number of things. Python cannot and will not guess at what your bytes might encode.

For example, int(b'0b11111111', 34) is also a valid interpretation, but that interpretation is not equal to hex FF.

The number of interpretations, in fact, is endless. The bytes could represent a series of ASCII codepoints, or image colors, or musical notes.

Until you explicitly apply an interpretation, the bytes object consists just of the sequence of values in the range 0-255, and the textual representation of those bytes use ASCII if so representable as printable text:

>>> list(bytes(b'0b11111111'))
[48, 98, 49, 49, 49, 49, 49, 49, 49, 49]
>>> list(bytes(b'0xff'))
[48, 120, 102, 102]

Those byte sequences are not equal.

If you want to interpret these sequences explicitly as integer literals, then use ast.literal_eval() to interpret decoded text values; always normalise first before comparison:

>>> import ast
>>> ast.literal_eval(b'0b11111111'.decode('utf8'))
255
>>> ast.literal_eval(b'0xff'.decode('utf8'))
255

answered Sep 28 '22 03:09

Martijn Pieters

It seems that what you were trying to do is get a byte string representing the value 0b11111111 (or 255). This is not what b'0b11111111' does – that actually stands for a byte string representing the character (Unicode) string '0b11111111'.

What you want would be written as b'\xff'. You can check that it is actually one byte: len(b'\xff') == 1.

To convert a Python int to a binary representation, you can use the ctypes library. You need to choose one of the C integer types, e.g.:

>>> bytes(ctypes.c_ubyte(255))
b'\xff'

>>> bytes(ctypes.c_ubyte(0xff))
b'\xff'

>>> bytes(ctypes.c_long(255))
b'\xff\x00\x00\x00\x00\x00\x00\x00'

Note: Instead of c_ubyte and c_long, you can use the aliases c_uint8 (i.e. 8-bit unsigned C integer) and c_int64 (64-bit signed C integer), respectively.

To convert back:

>>> ctypes.c_ubyte.from_buffer_copy(b'\xff').value
255

Be careful about overflow:

>>> ctypes.c_ubyte(256)
c_ubyte(0)

answered Sep 28 '22 03:09

ondra.cifka

Related questions
                            
                                Most Pythonic way to provide function metadata at compile time?
                            
                                Formatting text to be justified in Python 3.3 with .format() method
                            
                                Python - find incremental numbered sequences with a list comprehension [duplicate]
                            
                                Render form errors with the label rather than field name
                            
                                String split formatting in python 3
                            
                                Test graph equality in NetworkX
                            
                                Reading a file into a multidimensional array with Python
                            
                                Launching a python script via a symbolic link
                            
                                pandas dataframe groupby and get nth row
                            
                                how to copy only upper triangular values into array from numpy.triu()?
                            
                                Python unclosed resource: is it safe to delete the file?
                            
                                Installing PyQuery Via Pip
                            
                                profiling and finding bottleneck of a flask application --- current respond time is 30 second [closed]
                            
                                Why doesn't python execute anything after 'unittest.main()' gets executed?
                            
                                numpy broadcast from first dimension
                            
                                dot product of two 1D vectors in numpy
                            
                                login_required decorator from flask_login not redirecting to previous page
                            
                                How to improve speed with Stanford NLP Tagger and NLTK
                            
                                Handling keyboard interrupt when using subproccess
                            
                                Accessing a MySQL connection pool from Python multiprocessing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Comparison of byte literals

Tags:

python

comparison

base

byte

endianness

Matthew Hemke

People also ask

2 Answers

Martijn Pieters

ondra.cifka

Recent Activity

Donate For Us