Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using hashlib to compute md5 digest of a file in Python 3

Tags:

With python 2.7 the following code computes the mD5 hexdigest of the content of a file.

(EDIT: well, not really as answers have shown, I just thought so).

import hashlib  def md5sum(filename):     f = open(filename, mode='rb')     d = hashlib.md5()     for buf in f.read(128):         d.update(buf)     return d.hexdigest() 

Now if I run that code using python3 it raise a TypeError Exception:

    d.update(buf) TypeError: object supporting the buffer API required 

I figured out that I could make that code run with both python2 and python3 changing it to:

def md5sum(filename):     f = open(filename, mode='r')     d = hashlib.md5()     for buf in f.read(128):         d.update(buf.encode())     return d.hexdigest() 

Now I still wonder why the original code stopped working. It seems that when opening a file using the binary mode modifier it returns integers instead of strings encoded as bytes (I say that because type(buf) returns int). Is this behavior explained somewhere ?

like image 995
kriss Avatar asked Oct 19 '11 23:10

kriss


People also ask

How does Python calculate MD5 of a file?

# Import hashlib library (md5 method is part of it) import hashlib # File to check file_name = 'filename.exe' # Correct original md5 goes here original_md5 = '5d41402abc4b2a76b9719d911017c592' # Open,close, read file and calculate MD5 on its contents with open(file_name, 'rb') as file_to_check: # read contents of the ...

What is MD5 Hashlib?

The MD5, defined in RFC 1321, is a hash algorithm to turn inputs into a fixed 128-bit (16 bytes) length of the hash value. Note. MD5 is not collision-resistant – Two different inputs may producing the same hash value. Read this MD5 vulnerabilities. In Python, we can use hashlib.


2 Answers

I think you wanted the for-loop to make successive calls to f.read(128). That can be done using iter() and functools.partial():

import hashlib from functools import partial  def md5sum(filename):     with open(filename, mode='rb') as f:         d = hashlib.md5()         for buf in iter(partial(f.read, 128), b''):             d.update(buf)     return d.hexdigest()  print(md5sum('utils.py')) 
like image 112
Raymond Hettinger Avatar answered Oct 09 '22 15:10

Raymond Hettinger


for buf in f.read(128):   d.update(buf) 

.. updates the hash sequentially with each of the first 128 bytes values of the file. Since iterating over a bytes produces int objects, you get the following calls which cause the error you encountered in Python3.

d.update(97) d.update(98) d.update(99) d.update(100) 

which is not what you want.

Instead, you want:

def md5sum(filename):   with open(filename, mode='rb') as f:     d = hashlib.md5()     while True:       buf = f.read(4096) # 128 is smaller than the typical filesystem block       if not buf:         break       d.update(buf)     return d.hexdigest() 
like image 24
phihag Avatar answered Oct 09 '22 15:10

phihag