Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python shortest unique id from strings

I have more than 100 million unique strings (VARCHAR(100) UNIQUE in MySQL database). Now I use the code below to create unique hash from them (VARCHAR(32) UNIQUE) in order to reduct index size of the InnoDB table (a unique index on varchar(100) is roughly 3 times larger than on varchar(32) field).

id = hashlib.md5(str).hexdigest()

Is there any other method to create shorter ids from those strings and make reasonable uniqueness guarantees?

like image 685
jack Avatar asked Jun 19 '12 06:06

jack


People also ask

How do you generate a 6 digit unique ID in Python?

Let's see how we can generate a random integer in Python: # Generating a random integer in Python import random print(random. randint(0,10)) # Returns: 6 Generate Random Numbers Between Two Values in Python. In the example above, we used 0 as the starting point.

How do I find the unique ID in Python?

Use the uuid. uuid4() method to generate unique IDs, e.g. unique_id = uuid. uuid4() . The uuid built-in module implements a uuid4() method that generates and returns a random ID.

What is UUID uuid4 in Python?

UUID, Universal Unique Identifier, is a python library which helps in generating random objects of 128 bits as ids. It provides the uniqueness as it generates ids on the basis of time, Computer hardware (MAC etc.).


2 Answers

You can save it as integer:

id_ = int(hashlib.md5(your_str).hexdigest(), 16)

Or as binary string:

id_ = hashlib.md5(your_str).digest()
like image 157
simplylizz Avatar answered Oct 06 '22 01:10

simplylizz


One crude way can be, you could do md5 and then pick first 16 characters from it, instead of all 32. Collisions still won't be that high, and you'll have reasonable uniqueness guarantee.

like image 26
Hrishikesh Avatar answered Oct 05 '22 23:10

Hrishikesh