Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, get base64-encoded MD5 hash of an image object

Tags:

python

hash

md5

I need to get a base64-encoded MD5 hash of an object, where the object is an image stored as a file, fname.

I've tried this:

def get_md5(fname):
    hash = hashlib.md5()
    with open(fname) as f:
        for chunk in iter(lambda: f.read(4096), ""):
            hash.update(chunk)
    return hash.hexdigest().encode('base64').strip()

However, I don't think this is right because it returns a string with too many characters. My understanding is that it needs to be 24 characters long. I get

NjJiM2RlOWMzOTYxYmM3MDI5Y2Q1NzdjOTQ5YWRlYTQ=

I've tried a few other similar ways as well, for example, one that does not do the chunk loop thing. They all return the same string.

(My later actions that need the base64-encoded MD5 hash fail, and I'm thinking this could be why.)

like image 230
user984003 Avatar asked Aug 16 '15 19:08

user984003


People also ask

How do you decode Base64 encoding in Python?

Using Python to decode strings: Decoding Base64 string is exactly opposite to that of encoding. First we convert the Base64 strings into unencoded data bytes followed by conversion into bytes-like object into a string. The below example depicts the decoding of the above example encode string output.

Is Base64 MD5?

An MD5 value is always 22 (useful) characters long in Base64 notation. Many Base64 algorithms will also append 2 characters of padding when encoding an MD5 hash, bringing the total to 24 characters. The padding adds no useful information and can be discarded.


2 Answers

I was able to make it work by using digest() instead of hexdigest(). Then the last line becomes:

return hash.digest().encode('base64').strip()

The result was then 24 characters long, and it was accepted by Google Cloud Storage transfer, which required a base64-encoded MD5 hash.

For Python 3 (from the comment below):

import base64;  
return base64.b64encode(h.digest()).decode()
like image 158
user984003 Avatar answered Oct 09 '22 20:10

user984003


First, base64 encoding makes strings longer. (Example using IPython with Python 3):

In [1]: s = '123456789012345678901234'

In [2]: len(s)
Out[2]: 24

In [3]: import base64

In [4]: e = base64.b64encode(s.encode('utf8'))

In [5]: len(e)
Out[5]: 32

In [6]: e
Out[6]: b'MTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0'

With base64 encoding you get 8 bits of output for every 6 bits of input.

In [7]: 32/24
Out[7]: 1.333

In [8]: 8/6
Out[8]: 1.333

The base64 alphabet uses 64 (or 2**6) different symbols. Generally they include lower- and uppercase letters, the digits 0-9. This leaves two extra required symbols and a pading character. Often + and / are used as symbols, but there are variations. Especially since / is not allowed in UNIX or MS-Windows filenames.

Second, using a hexadecimal representation doubles the length of a byte string; the hex representation of one byte can vary between 00 and FF. Example (again using IPython and Python 3):

In [1]: import hashlib

In [2]: s = b'this is a simple test'

In [3]: len(hashlib.md5(s).digest())
Out[3]: 16

In [4]: len(hashlib.md5(s).hexdigest())
Out[4]: 32

If you are going to use base64 encoding anyway, it makes no sense to use hexdigest().

like image 35
Roland Smith Avatar answered Oct 09 '22 20:10

Roland Smith