Python, get base64-encoded MD5 hash of an image object

Tags:

I need to get a base64-encoded MD5 hash of an object, where the object is an image stored as a file, fname.

I've tried this:

def get_md5(fname):
    hash = hashlib.md5()
    with open(fname) as f:
        for chunk in iter(lambda: f.read(4096), ""):
            hash.update(chunk)
    return hash.hexdigest().encode('base64').strip()

However, I don't think this is right because it returns a string with too many characters. My understanding is that it needs to be 24 characters long. I get

NjJiM2RlOWMzOTYxYmM3MDI5Y2Q1NzdjOTQ5YWRlYTQ=

I've tried a few other similar ways as well, for example, one that does not do the chunk loop thing. They all return the same string.

(My later actions that need the base64-encoded MD5 hash fail, and I'm thinking this could be why.)

230

asked Aug 16 '15 19:08

2 Answers

I was able to make it work by using digest() instead of hexdigest(). Then the last line becomes:

return hash.digest().encode('base64').strip()

The result was then 24 characters long, and it was accepted by Google Cloud Storage transfer, which required a base64-encoded MD5 hash.

For Python 3 (from the comment below):

import base64;  
return base64.b64encode(h.digest()).decode()

158

answered Oct 09 '22 20:10

user984003

First, base64 encoding makes strings longer. (Example using IPython with Python 3):

In [1]: s = '123456789012345678901234'

In [2]: len(s)
Out[2]: 24

In [3]: import base64

In [4]: e = base64.b64encode(s.encode('utf8'))

In [5]: len(e)
Out[5]: 32

In [6]: e
Out[6]: b'MTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0'

With base64 encoding you get 8 bits of output for every 6 bits of input.

In [7]: 32/24
Out[7]: 1.333

In [8]: 8/6
Out[8]: 1.333

The base64 alphabet uses 64 (or 2**6) different symbols. Generally they include lower- and uppercase letters, the digits 0-9. This leaves two extra required symbols and a pading character. Often + and / are used as symbols, but there are variations. Especially since / is not allowed in UNIX or MS-Windows filenames.

Second, using a hexadecimal representation doubles the length of a byte string; the hex representation of one byte can vary between 00 and FF. Example (again using IPython and Python 3):

In [1]: import hashlib

In [2]: s = b'this is a simple test'

In [3]: len(hashlib.md5(s).digest())
Out[3]: 16

In [4]: len(hashlib.md5(s).hexdigest())
Out[4]: 32

If you are going to use base64 encoding anyway, it makes no sense to use hexdigest().

answered Oct 09 '22 20:10

Roland Smith

Related questions
                            
                                Matplotlib: figlegend only printing first letter
                            
                                Coloring exceptions from Python on a terminal
                            
                                How to increase connection timeout using sqlalchemy with sqlite in python
                            
                                List the words in a vocabulary according to occurrence in a text corpus, with Scikit-Learn CountVectorizer
                            
                                Select rows by partial string match in index
                            
                                Horizontal stacked bar chart in Matplotlib
                            
                                In a python logging is there a formatter to truncate the string?
                            
                                Unable to import distutils.dir_util on Windows
                            
                                import everything from a module except a few methods
                            
                                Tornado streaming HTTP response as AsyncHTTPClient receives chunks
                            
                                Sublime Text 3 API : Get all text from a file
                            
                                DataFrame correlation produces NaN although its values are all integers
                            
                                Python equivalent of daisy() in the cluster package of R
                            
                                Why I get error while trying to use LaTeX in plots' label
                            
                                Custom legend in Pandas bar plot (matplotlib)
                            
                                tkinter.messagebox.showinfo doesn't always work
                            
                                Sharing many queues among processes in Python
                            
                                How to install GSSAPI Python module?
                            
                                Pandas DataFrame: How to natively get minimum across range of rows and columns
                            
                                logging: print message only once

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python, get base64-encoded MD5 hash of an image object

Tags:

python

hash

md5

user984003

People also ask

2 Answers

user984003

Roland Smith

Recent Activity

Donate For Us