Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert hash.digest() to unicode

import hashlib
string1 = u'test'
hashstring = hashlib.md5()
hashstring.update(string1)
string2 = hashstring.digest()

unicode(string2)

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8f in position 1: ordinal
not in range(128)

The string HAS to be unicode for it to be any use to me, can this be done? Using python 2.7 if that helps...

like image 695
jbaranski Avatar asked Jun 06 '11 20:06

jbaranski


2 Answers

Ignacio just gave the perfect answer. Just a complement: when you convert some string from an encoding which has chars not found in ASCII to unicode, you have to pass the encoding as a parameter:

>>> unicode("órgão")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> unicode("órgão", "UTF-8")
u'\xf3rg\xe3o'

If you cannot say what is the original encoding (UTF-8 in my example) you really cannot convert to Unicode. It is a signal that something is not pretty correct in your intentions.

Last but not least, encodings are pretty confusing stuff. This comprehensive text about them can make them clear.

like image 78
brandizzi Avatar answered Oct 12 '22 00:10

brandizzi


The result of .digest() is a bytestring¹, so converting it to Unicode is pointless. Use .hexdigest() if you want a readable representation.

¹ Some bytestrings can be converted to Unicode, but the bytestrings returned by .digest() do not contain textual data. They can contain any byte including the null byte: they're usually not printable without using escape sequences.

like image 42
Ignacio Vazquez-Abrams Avatar answered Oct 12 '22 00:10

Ignacio Vazquez-Abrams