Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good algorithm for unique random generation of IDs

Tags:

I need to generate a random ID, alphanumeric, 6 characters, as the ID for shortlink service.

Currently, I generate a random 6-character code, look up in the db to see if it has been used before, and if it has, repeat the process. I need it to be unique for all 36^6 combinations. As the system grows, the worse its performance.

Is there known good approach that minimizes hitting the db, preserves state globally, and will not take more than 100ms to lookup?

Thx for any help

like image 983
metalaureate Avatar asked Oct 05 '13 23:10

metalaureate


People also ask

How are unique IDs created?

The simplest way to generate identifiers is by a serial number. A steadily increasing number that is assigned to whatever you need to identify next. This is the approached used in most internal databases as well as some commonly encountered public identifiers.

What is Snowflake ID generator?

Twitter created its own random ID generator called Twitter Snowflake to create unique IDs for tweets, users, messages and all other objects that need identification. Discord, Instagram and other social networking systems also use similar programs to create IDs.

How does MySQL generate unique ID?

This function in MySQL is used to return a Universal Unique Identifier (UUID) generated according to RFC 4122, “A Universally Unique Identifier (UUID) URN Namespace”. It is designed as a number that is universally unique. Two UUID values are expected to be distinct, even they are generated on two independent servers.


1 Answers

Use sequential numbers, so you will never have to hit the database searching for whether a key already exists. You just need to keep track of the maximum number assigned so far.

To make this sequential ID alphanumeric, convert it to base 36.

If you don't like the fact that the first IDs assigned will then look like 000000, 000001, ... 00000z, 000010,... , you can make use of a mathematical trick: Internally keep the sequential, numeric ID, but for the external representation visible to the user, multiply it (mod 36^6) by a large prime smaller than 36^6-1 - 1679979167 would be a decent choice - before converting to base 36. This will make your IDs appear random to the user even though they really aren't.

Here's a python sample with output:

def baseN(num,b,numerals="0123456789abcdefghijklmnopqrstuvwxyz"):     return ((num == 0) and numerals[0]) or (baseN(num // b, b, numerals).lstrip(numerals[0]) + numerals[num % b])  for internalID in range(1,200):     mangled = (internalID*1679979167)%(36**6)     print internalID, mangled, baseN(mangled,36) 

(baseN code is jellyfishtree's from here)


1 1679979167 rs7s7z 2 1183175998 jkfkfy 3 686372829 bcncnx 4 189569660 34v4vw 5 1869548827 ux2x3v 6 1372745658 mpapbu 7 875942489 ehihjt 8 379139320 69q9rs 9 2059118487 y1y1zr 10 1562315318 pu5u7q 11 1065512149 hmdmfp 12 568708980 9eleno 
like image 50
us2012 Avatar answered Sep 20 '22 22:09

us2012