Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django: Is Base64 of md5 hash of email address under 30 characters?

Tags:

python

django

I am investigating since a few hours the best way to use the Email address instead of username in Django authentication. This topic has been discussed many times but the given results are inconsistent.

1) The answer here points to a snippet that distinguishes the username and email simply by having an '@'char in it. The max length of email and username is not equal though and not considered in the answer.

2) The second answer - from the same link - from S.Lott (13 votes) is doing some black magic with admin.site. It doesn't make sense to me what the code is doing, is this the accepted way of doing it short and sweet?

3) Then I found this solution, which seems almost perfect (and makes sense to me):

username = uuid.uuid4().hex[:30]

It picks only the first 30 chars of a unique Python generated ID as the username. But there is still a chance of collision. Then I came across a post where someone has claimed

A base64 encoding of an md5 hash has 25 characters

If thats true, couldn't we take the base64 encoding of an md5 hash of the email address and guarantee 100% unique usernames, which are also under 30 character? If this is true, how could this be achieved?

Many Thanks,

like image 953
Houman Avatar asked Feb 21 '23 03:02

Houman


2 Answers

You can do it like this:

>>> from hashlib import md5
>>> h = md5('[email protected]').digest().encode('base64')[:-1]
>>> _
'Vlj/zO5/Dr/aKyJiOLHrbg=='
>>> len(h)
24

You can ignore the last char because it's just a new line. The chance of collision is the same as the MD5 hash, you don't lose information when you encode in base64.

>>> original = md5('[email protected]').digest()
>>> encoded = original.encode('base64')
>>> original == encoded.decode('base64') 
True
like image 184
hcalves Avatar answered Feb 23 '23 00:02

hcalves


MD5 hashes are always 16 bytes long, and Base64 encodes groups of 3 bytes to 4 characters; thus (16 / 3 rounded up) => 6 groups of 3, times 4 = 24 characters for a MD5 hash encoded to Base64.

However, note that the above linked Wikipedia page states:

However, it has since been shown that MD5 is not collision resistant.

So you cannot count on this method giving you unique usernames from email addresses. Producing them is very easy with the help of the hashlib module:

>>> from hashlib import md5
>>> md5('[email protected]').digest().encode('base64').strip()
'862kBc6JC2+CBAlN6xLYqA=='
like image 27
Martijn Pieters Avatar answered Feb 22 '23 22:02

Martijn Pieters