MD5 hash discrepancy between Python and PHP?


I'm trying to create a checksum of a binary file (flv/f4v, etc) to verify the contents of the file between the server and client computers. The application that's running on the client computer is python-based, while the server is using PHP.

PHP code is as follows:

$fh = fopen($filepath, 'rb'); $contents = fread($fh, filesize($filepath)); $checksum = md5(base64_encode($contents)); fclose($fh); 

Python code is as follows:

def _get_md5(filepath):     fh = open(filepath, 'rb')     md5 = hashlib.md5()     md5.update(f.read().encode('base64'))     checksum = md5.hexdigest()     f.close()     return checksum 

on the particular file I'm testing, the PHP and Python md5 hash strings are as follows, respectively:

cfad0d835eb88e5342e843402cc42764 0a96e9cc3bb0354d783dfcb729248ce0 

Server is running CentOS, while the client is a MacOSX environment. I would greatly appreciate any help in understanding why the two are generating different hash results, or if it something I overlooked (I am relatively new to Python...). Thank you!

[post mortem: the problem was ultimately the difference between Python and PHP's base64 encoding varieties. MD5 works the same between the two scripting platforms (at least using .hexdigest() in Python).]

1 Answers

I would rather assume that the base64 implementations differ.



php -r 'var_dump(base64_encode(str_repeat("x", 10)));' string(16) "eHh4eHh4eHh4eA==" 

Python (Note the trailing newline):

>>> ("x" * 10).encode('base64') 'eHh4eHh4eHh4eA==\n' 
