Random strings to identify a record?

Question

I want to do something similair to what imgur and most other websites do: I want to use a random string in the URL to identify whatever post a user is looking for.

Using a random string like that as a primary key would probably not be a very good idea, and making sure the randomly generated string isn't taken already, while the user is sending a submission, would slow down a table over time, as it would need to check more and more records to make sure there are no duplicates. How would one go about implementing random strings like that for identification?

My idea, and please tell me if it's a really bad idea, is to have a table that is filled with these random strings. The table would look like this:

| submissionId | stringId 
+--------------+----------
| 1            | rbMZV    
+--------------+----------
| 2            | MQyPi    
+--------------+----------
| NULL         | hfXL7

When these strings are generated, they don't have a submissionId assigned, like "hfXL7" in my example table. When a submission is made by a user, my script will take the first randomly generated string that doesn't have a submissionId assigned yet and adds the submissionId generated when the submission was made to that record. I have a process somewhere that regularly generates more strings that can be used as people make more submissions, so when someone makes a submission, there is always at least one randomly generated string without a submissionId yet.

Steve Chambers · Accepted Answer

Here are three basic approaches:

Generate and store all the random IDs up front - sufficiently many that they are never likely to run out (given the predicted total number of uses). One downside here is it may be difficult to predict the total number of IDs required to support the lifetime of the system.
Generate a sufficient number of random IDs to provide more than enough for a set time period. Then periodically generate enough new ones to cater for predicted demand. (E.g. the time period might be one day and the generator might be scheduled to run at some point during the night when demand is low.)
Generate random IDs on the fly - only as needed.

There are pros and cons of each:

If storage isn't a problem, (1) is perhaps the simplest option as once it's done it's done and remains forever - you wouldn't have to worry about failed jobs etc.
(2) is basically your proposed approach: This seems fine but there are more things to consider here such as unpredictable usage spikes, failed scheduled jobs etc.
(3) may also be a simple and keeps it lean as the table will grow over time and there is no need to predict the usage. The potential downside here is that any such function would have keep generating IDs until a unique one is found so it could potentially become slower as the number of IDs increases - although this may never be a problem as long as the number of different random permutations is significantly larger than the potential total number of usages.

Demo for approach (3) above

Online demo of how an on-the-fly generator could be implemented in MySQL: http://rextester.com/TKGPZ41053

Number of permutations calculation

If case-sensitive alphanumerics, there are a total of 62 different characters. So the number of possible permutations for each length is as follows:

Characters | Permutations
1          | 62
2          | 3844
3          | 238328
4          | 14776336
5          | 916132832
6          | 56800235584
7          | 3521614606208
8          | 218340105584896
9          | 13537086546263552
10         | 839299365868340224

Random strings to identify a record?

Tags:

sql

database

1 Answers

Steve Chambers

Recent Activity

Donate For Us

Random strings to identify a record?

Tags:

sql

database

1 Answers

Steve Chambers

Related questions

Recent Activity

Donate For Us