What is the best way to compare a string object to a bytes object that works in both Python 2 and Python 3? Assume both are UTF-8. More generally, how does one write a Python 2 and Python 3 compatible comparison of two objects that may each be a string, bytes, or Unicode object?
The problem is that "asdf" == b"asdf"
is True in Python 2 and False in Python 3.
Meanwhile, one cannot blindly encode or decode objects, since strings in Python 2 have both encode
and decode
methods, but strings in Python 3 just have encode methods.
Finally, isinstance(obj, bytes)
returns True for any non-unicode string in Python 2 and returns True for only bytes objects in Python 3.
In Python 2, the str type was used for two different kinds of values – text and bytes, whereas in Python 3, these are separate and incompatible types. Text contains human-readable messages, represented as a sequence of Unicode codepoints. Usually, it does not contain unprintable control characters such as \0 .
If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str. encode(). If it is an integer, the array will have that size and will be initialized with null bytes.
Byte objects are sequence of Bytes, whereas Strings are sequence of characters. Byte objects are in machine readable form internally, Strings are only in human readable form. Since Byte objects are machine readable, they can be directly stored on the disk.
In both Python 2 and Python 3, anything that is an instance of bytes
has a decode method. Thus, you can do the following:
def compare(a, b, encoding="utf8"):
if isinstance(a, bytes):
a = a.decode(encoding)
if isinstance(b, bytes):
b = b.decode(encoding)
return a == b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With