With python 2.7 the following code computes the mD5 hexdigest of the content of a file.
(EDIT: well, not really as answers have shown, I just thought so).
import hashlib def md5sum(filename): f = open(filename, mode='rb') d = hashlib.md5() for buf in f.read(128): d.update(buf) return d.hexdigest()
Now if I run that code using python3 it raise a TypeError Exception:
d.update(buf) TypeError: object supporting the buffer API required
I figured out that I could make that code run with both python2 and python3 changing it to:
def md5sum(filename): f = open(filename, mode='r') d = hashlib.md5() for buf in f.read(128): d.update(buf.encode()) return d.hexdigest()
Now I still wonder why the original code stopped working. It seems that when opening a file using the binary mode modifier it returns integers instead of strings encoded as bytes (I say that because type(buf) returns int). Is this behavior explained somewhere ?
# Import hashlib library (md5 method is part of it) import hashlib # File to check file_name = 'filename.exe' # Correct original md5 goes here original_md5 = '5d41402abc4b2a76b9719d911017c592' # Open,close, read file and calculate MD5 on its contents with open(file_name, 'rb') as file_to_check: # read contents of the ...
The MD5, defined in RFC 1321, is a hash algorithm to turn inputs into a fixed 128-bit (16 bytes) length of the hash value. Note. MD5 is not collision-resistant – Two different inputs may producing the same hash value. Read this MD5 vulnerabilities. In Python, we can use hashlib.
I think you wanted the for-loop to make successive calls to f.read(128)
. That can be done using iter() and functools.partial():
import hashlib from functools import partial def md5sum(filename): with open(filename, mode='rb') as f: d = hashlib.md5() for buf in iter(partial(f.read, 128), b''): d.update(buf) return d.hexdigest() print(md5sum('utils.py'))
for buf in f.read(128): d.update(buf)
.. updates the hash sequentially with each of the first 128 bytes values of the file. Since iterating over a bytes
produces int
objects, you get the following calls which cause the error you encountered in Python3.
d.update(97) d.update(98) d.update(99) d.update(100)
which is not what you want.
Instead, you want:
def md5sum(filename): with open(filename, mode='rb') as f: d = hashlib.md5() while True: buf = f.read(4096) # 128 is smaller than the typical filesystem block if not buf: break d.update(buf) return d.hexdigest()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With