I am trying to read in a file of passwords. Then I am trying to compute the hash for each password and compare it to a hash I already have to determine if I have discovered the password. However the error message I keep getting is "TypeError: Unicode-objects must be encoded before hashing". Here is my code: <pre class="prettyprint"><code>from hashlib import sha256 with open('words','r') as f: for line in f: hashedWord = sha256(line.rstrip()).hexdigest() if hashedWord == 'ca52258a43795ab5c89513f9984b8f3d3d0aa61fb7792ecefe8d90010ee39f2': print(line + "is one of the words!") </code></pre> Can anyone please help and provide an explanation?

The error message means exactly what it says: You have a Unicode string. You can't SHA-256-hash a Unicode string, you can only hash bytes. But why do you have a Unicode string? Because you're opening a file in text mode, which means you're implicitly asking Python to decode the bytes in that file (using your default encoding) to Unicode. If you want to get the raw bytes, you have to use binary mode. In other words, just change this line: <pre class="prettyprint"><code>with open('words','r') as f: </code></pre> … to: <pre class="prettyprint"><code>with open('words', 'rb') as f: </code></pre> <hr> You may notice that, once you fix this, the <code>print</code> line raises an exception. Why? because you're trying to add a <code>bytes</code> to a <code>str</code>. You're also missing a space, and you're printing the un-stripped line. You could fix all of those by using two arguments to <code>print</code> (as in <code>print(line.rstrip(), "is one of the words")</code>). But then you'll get output like <code>b'\xc3\x85rhus' is one of the words</code> when you wanted it to print out <code>Århus is one of the words</code>. That's because you now have bytes, not strings. Since Python is no longer decoding for you, you'll need to do that manually. To use the same default encoding that sometimes works when you don't specify an encoding to <code>open</code>, just call <code>decode</code> without an argument. So: <pre class="prettyprint"><code>print(line.rstrip().decode(), "is one of the words") </code></pre>

If you want read information as unicode string from the file, this code line would work: <code>hashedWord = sha256(line.encode('utf-8')).hexdigest()</code>

Python: TypeError: Unicode-objects must be encoded before hashing

Tags:

python

sha256

I am trying to read in a file of passwords. Then I am trying to compute the hash for each password and compare it to a hash I already have to determine if I have discovered the password. However the error message I keep getting is "TypeError: Unicode-objects must be encoded before hashing". Here is my code:

from hashlib import sha256  with open('words','r') as f:     for line in f:          hashedWord = sha256(line.rstrip()).hexdigest()          if hashedWord == 'ca52258a43795ab5c89513f9984b8f3d3d0aa61fb7792ecefe8d90010ee39f2':             print(line + "is one of the words!")

Can anyone please help and provide an explanation?

982

asked Oct 23 '14 23:10

user3479683

2 Answers

The error message means exactly what it says: You have a Unicode string. You can't SHA-256-hash a Unicode string, you can only hash bytes.

But why do you have a Unicode string? Because you're opening a file in text mode, which means you're implicitly asking Python to decode the bytes in that file (using your default encoding) to Unicode. If you want to get the raw bytes, you have to use binary mode.

In other words, just change this line:

with open('words','r') as f:

… to:

with open('words', 'rb') as f:

You may notice that, once you fix this, the print line raises an exception. Why? because you're trying to add a bytes to a str. You're also missing a space, and you're printing the un-stripped line. You could fix all of those by using two arguments to print (as in print(line.rstrip(), "is one of the words")).

But then you'll get output like b'\xc3\x85rhus' is one of the words when you wanted it to print out Århus is one of the words. That's because you now have bytes, not strings. Since Python is no longer decoding for you, you'll need to do that manually. To use the same default encoding that sometimes works when you don't specify an encoding to open, just call decode without an argument. So:

print(line.rstrip().decode(), "is one of the words")

119

answered Sep 26 '22 00:09

abarnert

If you want read information as unicode string from the file, this code line would work:
hashedWord = sha256(line.encode('utf-8')).hexdigest()

answered Sep 23 '22 00:09

Cloud Cho

Related questions
                            
                                Closest equivalent of a factor variable in Python Pandas
                            
                                What's the difference between pandas ACF and statsmodel ACF?
                            
                                PHP equivalent of Python's __name__ == "__main__"?
                            
                                How to Reduce the time taken to load a pickle file in python
                            
                                Jupyter (IPython) notebook: Convert an HTML notebook to ipynb
                            
                                Using subprocess.Popen for Process with Large Output
                            
                                Do I have to do StringIO.close()?
                            
                                How can I access a classmethod from inside a class in Python
                            
                                how to apply a mask from one array to another array?
                            
                                Fastest way to process a large file?
                            
                                Python metaclasses vs class decorators
                            
                                Python URLLib / URLLib2 POST
                            
                                Relative importing modules from parent folder subfolder
                            
                                What does the Python version line mean?
                            
                                How do I make pytest fixtures work with decorated functions?
                            
                                Why do we need locks for threads, if we have GIL?
                            
                                Should enum instances be compared by identity or equality?
                            
                                How to multiply two vector and get a matrix?
                            
                                Creating large Pandas DataFrames: preallocation vs append vs concat
                            
                                In python, super() is always called first in a method. Are there situations where it should be called later?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With