I'm working on a python project in 2.6 that also has future support for python 3 being worked in. Specifically I'm working on a digest-md5 algorithm.
In python 2.6 without running this import:
from __future__ import unicode_literals
I am able to write a piece of code such as this:
a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest()
a1 = "%s:%s:%s" %(a1, challenge["nonce"], cnonce )
Without any issues, my authentication works fine. When I try the same line of code with the unicode_literals imported I get an exception:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa8 in position 0: unexpected code byte
Now I'm relatively new to python so I'm a bit stuck in figuring this out. if I replace the %s in the formatting string as %r I am able to concatenate the string, but the authentication doesn't work. The digest-md5 spec that I had read says that the 16 octet binary digest must be appended to these other strings.
Any thoughts?
Python concatenate strings and bytes To concatenate strings and bytes we will use the + operator to concatenate, and also we use str() to convert the bytes to string type, and then it will be concatenated. To get the output, I have used print(my_str + str(bytes)).
To join a list of Bytes, call the Byte. join(list) method. If you try to join a list of Bytes on a string delimiter, Python will throw a TypeError , so make sure to call it on a Byte object b' '. join(...)
The recommended solution to concatenate two or more byte arrays is using ByteArrayOutputStream . The idea is to write bytes from each of the byte arrays to the output stream, and then call toByteArray() to get the current contents of the output stream as a byte array.
Python supports string concatenation using the + operator. In most other programming languages, if we concatenate a string with an integer (or any other primitive data types), the language takes care of converting them to a string and then concatenates it.
The reason for the behaviour you observed is that from __future__ import unicode_literals
switches the way Python works with strings:
unicode_literals
future, strings without the u prefix are unicode strings encoded in either UCS-2 or UCS-4 (depends on the compiler flag used when compiling Python). Strings with the b prefix are literals for the data type bytes
which are rather similar to pre-3.x non-unicode strings.In either version of Python, byte-strings and unicode-strings must be converted. The conversion performed by default depends on your system's default charset; in your case this is UTF-8. Without setting anything, it should be ascii, which rejects all characters above \x7f.
The message digest returned by hashlib.md5(...).digest() is a bytes-string, and I suppose you want the result of the whole operation to be a byte-string as well. If you want that, convert the nonce and cnonce-strings to byte-strings.:
a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest()
# note that UTF-8 may not be the encoding required by your counterpart, please check
a1 = b"%s:%s:%s" %(a1, challenge["nonce"].encode("UTF-8"), cnonce.encode("UTF-8") )
Alternatively, you can convert the byte-string coming from the call to digest()
to a unicode string (not recommended). As the lower 8 bit of UCS-2 are equivalent to ISO-8859-1, this might serve your needs:
a1 = hashlib.md5("%s:%s:%s" % (self.username, self.domain, self.password)).digest()
a1 = "%s:%s:%s" %(a1.decode("ISO-8859-1"), challenge["nonce"], cnonce)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With