I have a users table, the user ID is public. But I want to obfuscate the number of registered user and trends of the project, so I don't want to have public incrementing IDs.
When a new user is created I want to find a random integer number that is greater than a certain number and that is not yet in the database.
Naive code:
<?php
$found = false;
while(!$found) {
$uid = rand(1000000000,4294967295) // find random number betwen minimum and maximum
$dbh->beginTransaction();
// check if user id is in use, and if not insert it
if($dbh->query("SELECT * FROM users WHERE uid = $uid")) {
$dbh->exec("INSERT INTO users (uid) VALUES ($uid)");
$found = true;
}
$dbh->commit();
}
// we just got our new uid ...
?>
This will work it however may become inefficient. True that there is a big range and the probability of hitting an unused uid is high. But what if I want to use a smaller range, because I don't want to have so long userids?
Example of my concerns:
So one method to optimize is that came to my mind is the following:
That should give me a number with a maximum runtime of O(range)
That sounds pretty bad but I think it is not, because I submit random numbers to the database and that they are all at the beginnig is very unlikely. So how good/bad is it really?
I think this would work just fine but I want it BETTER
So what about this?
If I am thinking correctly this will give ma a number with a maximum of O(log(range)) time.
That is pretty satisfying because log() is pretty good. However I think this method will often be as bad as possible. Because with our random numbers we will probably always hit numbers in the large intervals.
So at the beginning our pure random method is probably better.
So what about having a limit like this
What would X be and why?
So final question:
This is pretty easy and pretty complicated at the same time.
I think this is a standard problem because lots and lots of system use random ids (support tickets etc), so I cannot imagine I am the first one to stumble across this.
How would you solve this? Any input is appriciated!
Is there maby an existing class / procedure for this I can use?
Or maby some database functions that I can use?
I would like to do it in PHP/Mysql
IMPORTANT EDIT:
I just thought about the range/logarithmic solution. It seems to be complete bullshit sorry for my wording because:
Then I am dividing my range so long if it is only 1. And even then the number is occoupied.
So its completely the same as the pure random method from start, only worse....
I am a bit embarassed I made this up but I will leave it in because I think its a good example of overcomplicated thinknig!
We can use srand () and rand () function to generate random numbers between 0 and 1. Note that we have to cast the output of rand () function to the decimal value either float or double.
If p
is the proportion of ids in use, your "naive" solution will, on average, require 1/(1-p) attempts to find an unused id. (See Exponential distribution). In the case of 60% occupancy, that is a mere 1/0.4 = 2.5 queries ...
Your "improved" solution requires about log(n) database calls, where n is the number of ids in use. That is quite a bit more than the "naive" solution. Also, your improved solution is incomplete (for instance, it does not handle the case where all number in a subrange are taken, and does not elaborate with subrange you recurse into) and is more complex to implement to boot.
Finally, note that your implementation will only be thread safe if the database provides very strict transaction isolation, which scales poorly, and might not be the default behaviour of your database system. If that turns out to be a problem, you could speculatively insert with a random id, and retry in the event of a constraint violation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With