Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 2 vs 3. Same inputs, different results. MD5 hash

Python 3 code:

def md5hex(data):
    """ return hex string of md5 of the given string """
    h = MD5.new()
    h.update(data.encode('utf-8'))
    return b2a_hex(h.digest()).decode('utf-8')

Python 2 code:

def md5hex(data):
    """ return hex string of md5 of the given string """
    h = MD5.new()
    h.update(data)
    return b2a_hex(h.digest())

Input python 3:

>>> md5hex('bf5¤7¤8¤3')
'61d91bafe643c282bd7d7af7083c14d6'

Input python 2:

>>> md5hex('bf5¤7¤8¤3')
'46440745dd89d0211de4a72c7cea3720'

Whats going on?

EDIT:

def genurlkey(songid, md5origin, mediaver=4, fmt=1):
    """ Calculate the deezer download url given the songid, origin and media+format """
    data = b'\xa4'.join(_.encode("utf-8") for _ in [md5origin, str(fmt), str(songid), str(mediaver)])
    data = b'\xa4'.join([md5hex(data), data])+b'\xa4'
    if len(data)%16:
        data += b'\x00' * (16-len(data)%16)
    return hexaescrypt(data, "jo6aey6haid2Teih").decode('utf-8')

All this problem started with this b'\xa4' in python 2 code in another function. This byte doesn't work in python 3.

And with that one I get the correct MD5 hash...

like image 451
Eduardo M Avatar asked Dec 09 '16 20:12

Eduardo M


2 Answers

Use hashlib & a language agnostic implementation instead:

import hashlib
text = u'bf5¤7¤8¤3'
text = text.encode('utf-8')
print(hashlib.md5(text).hexdigest())

works in Python 2/3 with the same result:

Python2:

'61d91bafe643c282bd7d7af7083c14d6'

Python3 (via repl.it):

'61d91bafe643c282bd7d7af7083c14d6'

The reason your code is failing is the encoded string is not the same string as the un-encoded one: You are only encoding for Python 3.


If you need it to match the unencoded Python 2:

import hashlib
text = u'bf5¤7¤8¤3'
print(hashlib.md5(text.encode("latin1")).hexdigest())

works:

46440745dd89d0211de4a72c7cea3720

the default encoding for Python 2 is latin1 not utf-8

like image 72
TemporalWolf Avatar answered Nov 15 '22 13:11

TemporalWolf


Default encoding in python3 is Unicode. In python 2 it's ASCII. So even if string matches when read they are presented differently.

like image 1
Alex Baranowski Avatar answered Nov 15 '22 12:11

Alex Baranowski