Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: TypeError: Unicode-objects must be encoded before hashing

Tags:

python

sha256

I am trying to read in a file of passwords. Then I am trying to compute the hash for each password and compare it to a hash I already have to determine if I have discovered the password. However the error message I keep getting is "TypeError: Unicode-objects must be encoded before hashing". Here is my code:

from hashlib import sha256  with open('words','r') as f:     for line in f:          hashedWord = sha256(line.rstrip()).hexdigest()          if hashedWord == 'ca52258a43795ab5c89513f9984b8f3d3d0aa61fb7792ecefe8d90010ee39f2':             print(line + "is one of the words!") 

Can anyone please help and provide an explanation?

like image 982
user3479683 Avatar asked Oct 23 '14 23:10

user3479683


People also ask

How do you encode before hashing?

The Python "TypeError: Strings must be encoded before hashing" occurs when we pass a string to a hashing algorithm. To solve the error, use the encode() method to encode the string to a bytes object, e.g. my_str. encode('utf-8') .

What is Hexdigest in Python?

hexdigest() : the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments.


2 Answers

The error message means exactly what it says: You have a Unicode string. You can't SHA-256-hash a Unicode string, you can only hash bytes.

But why do you have a Unicode string? Because you're opening a file in text mode, which means you're implicitly asking Python to decode the bytes in that file (using your default encoding) to Unicode. If you want to get the raw bytes, you have to use binary mode.

In other words, just change this line:

with open('words','r') as f: 

… to:

with open('words', 'rb') as f: 

You may notice that, once you fix this, the print line raises an exception. Why? because you're trying to add a bytes to a str. You're also missing a space, and you're printing the un-stripped line. You could fix all of those by using two arguments to print (as in print(line.rstrip(), "is one of the words")).

But then you'll get output like b'\xc3\x85rhus' is one of the words when you wanted it to print out Århus is one of the words. That's because you now have bytes, not strings. Since Python is no longer decoding for you, you'll need to do that manually. To use the same default encoding that sometimes works when you don't specify an encoding to open, just call decode without an argument. So:

print(line.rstrip().decode(), "is one of the words") 
like image 119
abarnert Avatar answered Sep 26 '22 00:09

abarnert


If you want read information as unicode string from the file, this code line would work:
hashedWord = sha256(line.encode('utf-8')).hexdigest()

like image 43
Cloud Cho Avatar answered Sep 23 '22 00:09

Cloud Cho