Efficient method to find collision free random numbers

Tags:

I have a users table, the user ID is public. But I want to obfuscate the number of registered user and trends of the project, so I don't want to have public incrementing IDs.

When a new user is created I want to find a random integer number that is greater than a certain number and that is not yet in the database.

Naive code:

<?php
    $found = false;
    while(!$found) {
      $uid = rand(1000000000,4294967295) // find random number betwen minimum and maximum
      $dbh->beginTransaction();
      // check if user id is in use, and if not insert it
      if($dbh->query("SELECT * FROM users WHERE uid = $uid")) {
        $dbh->exec("INSERT INTO users (uid) VALUES ($uid)");
        $found = true;
      }
      $dbh->commit();
    }
    // we just got our new uid ...
?>

This will work it however may become inefficient. True that there is a big range and the probability of hitting an unused uid is high. But what if I want to use a smaller range, because I don't want to have so long userids?

Example of my concerns:

60% of all user ids are in use
the chance of hitting an unused uid are 0.4
the first attempt has 0.4% success rate
if 1st not successful the second attempt has 0.6*0.4 probability
so with a maximum of two tries i have 0.4 + 0.6*0.4 proability (is that right??)

So one method to optimize is that came to my mind is the following:

find a random number, check if its free, if not, increment it by 1 and try again and so on
if the maximum number is hit, continue with the minimum number

That should give me a number with a maximum runtime of O(range)

That sounds pretty bad but I think it is not, because I submit random numbers to the database and that they are all at the beginnig is very unlikely. So how good/bad is it really?

I think this would work just fine but I want it BETTER

So what about this?

find a random number
query the database for how many numbers are occupied in the range whole range, starting from that number (this first step is trivial...)
if there are numbers occupied in that range, divide the range by half and try again. starting with the initial number
if there are numbers occupied divide the range by half and try again. starting with the initial number

If I am thinking correctly this will give ma a number with a maximum of O(log(range)) time.

That is pretty satisfying because log() is pretty good. However I think this method will often be as bad as possible. Because with our random numbers we will probably always hit numbers in the large intervals.

So at the beginning our pure random method is probably better.

So what about having a limit like this

select current number of used numbers
is it greater than X, logarithmic range approach
if it is not, use pure random method

What would X be and why?

So final question:

This is pretty easy and pretty complicated at the same time.

I think this is a standard problem because lots and lots of system use random ids (support tickets etc), so I cannot imagine I am the first one to stumble across this.

How would you solve this? Any input is appriciated!

Is there maby an existing class / procedure for this I can use?

Or maby some database functions that I can use?

I would like to do it in PHP/Mysql

IMPORTANT EDIT:

I just thought about the range/logarithmic solution. It seems to be complete bullshit sorry for my wording because:

what if i hit an occupied number at start?

Then I am dividing my range so long if it is only 1. And even then the number is occoupied.

So its completely the same as the pure random method from start, only worse....

I am a bit embarassed I made this up but I will leave it in because I think its a good example of overcomplicated thinknig!

785

asked Jul 24 '11 17:07

The Surrican

1 Answers

If p is the proportion of ids in use, your "naive" solution will, on average, require 1/(1-p) attempts to find an unused id. (See Exponential distribution). In the case of 60% occupancy, that is a mere 1/0.4 = 2.5 queries ...

Your "improved" solution requires about log(n) database calls, where n is the number of ids in use. That is quite a bit more than the "naive" solution. Also, your improved solution is incomplete (for instance, it does not handle the case where all number in a subrange are taken, and does not elaborate with subrange you recurse into) and is more complex to implement to boot.

Finally, note that your implementation will only be thread safe if the database provides very strict transaction isolation, which scales poorly, and might not be the default behaviour of your database system. If that turns out to be a problem, you could speculatively insert with a random id, and retry in the event of a constraint violation.

answered Oct 05 '22 06:10

meriton

Related questions
                            
                                Maintain session between HttpUrlConnection Calls (Native/Webview)
                            
                                Mysterious number icon next to file name in PhpStorm [closed]
                            
                                What does the keyword "callable" do in PHP
                            
                                How to implement array type route in Laravel?
                            
                                Good Zend Framework example apps to learn from [closed]
                            
                                Retrieving Multiple Result sets with stored procedure in php/mysqli
                            
                                Razor syntax PHP equivalent
                            
                                PHP sanitize user data for use in header() function
                            
                                Speed of PHP vs JavaScript?
                            
                                CIDR bitwise operations - could I be a bit wiser?
                            
                                Parse error: syntax error, unexpected ')', expecting :: (T_PAAMAYIM_NEKUDOTAYIM) [duplicate]
                            
                                How to add link in subject of email (gmail)
                            
                                Is there a way to test if a closure is also a generator?
                            
                                Client Server REST API captcha implementation
                            
                                Warning on ZipArchive Close
                            
                                How to handle pagination queries properly with mongodb and php?
                            
                                Sanitize user defined CSS in PHP
                            
                                PHP - What is a good way to produce a short alphanumeric string from a long md5 hash?
                            
                                Problem when sending mail with Zend Mail?
                            
                                How to structure a project in Symfony2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient method to find collision free random numbers

Tags:

php

random

mysql

primary-key

The Surrican

People also ask

1 Answers

meriton

Recent Activity

Donate For Us