Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tinyurl-style unique code: potential algorithm to prevent collisions

I have a system that requires a unique 6-digit code to represent an object, and I'm trying to think of a good algorithm for generating them. Here are the pre-reqs:

  • I'm using a base-20 system (no caps, numbers, vowels, or l to prevent confusion and naughty words)
    • The base-20 allows 64 million combinations
  • I'll be inserting potentially 5-10 thousand entries at once, so in theory I'd use bulk inserts, which means using a unique key probably won't be efficient or pretty (especially if there starts being lots of collisions)
  • It's not out of the question to fill up 10% of the combinations so there's a high potential for lots of collisions
  • I want to make sure the codes are non-consecutive

I had an idea that sounded like it would work, but I'm not good enough at math to figure out how to implement it: if I start at 0 and increment by N, then convert to base-20, it seems like there should be some value for N that lets me count each value from 0-63,999,999 before repeating any.

For example, going from 0 through 9 using N=3 (so 10 mod 3): 0, 3, 6, 9, 2, 5, 8, 1, 4, 7.

Is there some magic math method for figuring out values of N for some larger number that is able to count through the whole range without repeating? Ideally, the number I choose would sort of jump around the set such that it wasn't obvious that there was a pattern, but I'm not sure how possible that is.

Alternatively, a hashing algorithm that guaranteed uniqueness for values 0-64 million would work, but I'm way too dumb to know if that's possible.

like image 853
Dan Breen Avatar asked Aug 10 '09 23:08

Dan Breen


2 Answers

All you need is a number that shares no factors with your key space. Easiest value is to use a prime number. You can google for large primes, or use http://primes.utm.edu/lists/small/10000.txt

like image 169
Cullen Walsh Avatar answered Oct 09 '22 09:10

Cullen Walsh


Any prime number which is not a factor of the length of the sequence should be able to span the sequence without repeating. For 64000000, that means you shouldn't use 2 or 5. Of course, if you don't want them to be generated consecutively, generating them 2 or 5 apart is probably also not very good. I personally like the number 73973!

like image 28
Nick Lewis Avatar answered Oct 09 '22 09:10

Nick Lewis