I'm implementing a URL shortening feature in my application in order to provide my users shorter alternative URLs that can be used in Twitter. The point is to be independent from the shortening services that offer this same service and include it as a feature of my web app.
What's the best way to create an unique random sequence of characters of about 6 chars? I plan to use that as an index for the items in my database that will have the alternative URLs.
Edited:
This feature will be used in a job board website, where every new job ad will get a custom URL with the title plus the shorter one to be used in Twitter. That said, the total number of unique 6 char combinations will be more than enough for a long time.
The rand() function generates the random alphabets in a string and srand() function is used to seed the rand() function. Initially, we have a program that sets the array of alphabets size as “ch_Max,” which is of int char data type.
Do you really need 'random', or would 'unique' be sufficient?
Unique is extremely simple - just insert the URL into a database, and convert the sequential id for that record to a base-n number which is represented by your chosen characterset.
For example, if you want to only use [A-Z] in your sequence, you convert the id of the record to a base 26 number, where A=1, B=2,... Z=26. The algothithm is a recursive div26/mod26, where the quotient is the required character and the remainder is used to calculate the next character.
Then when retrieving URL, you perform the inverse function, which is to convert the base-26 number back to decimal. Perform SELECT URL WHERE ID = decimal, and you're done!
EDIT:
private string alphabet = "abcdefghijklmnopqrstuvwxyz";
// or whatever you want. Include more characters
// for more combinations and shorter URLs
public string Encode(int databaseId)
{
string encodedValue = String.Empty;
while (databaseId > encodingBase)
{
int remainder;
encodedValue += alphabet[Math.DivRem(databaseId, alphabet.Length,
out remainder)-1].ToString();
databaseId = remainder;
}
return encodedValue;
}
public int Decode(string code)
{
int returnValue;
for (int thisPosition = 0; thisPosition < code.Length; thisPosition++)
{
char thisCharacter = code[thisPosition];
returnValue += alphabet.IndexOf(thisCharacter) *
Math.Pow(alphabet.Length, code.Length - thisPosition - 1);
}
return returnValue;
}
The simplest way to make unique sequences is to do this sequentially, ie: aaaaaa aaaaab aaaaac ... These aren't necessarily the prettiest, but will guarantee uniqueness for the first 12230590463 sequences (provided you used a-z and A-Z as unique characters). If you need more URLs than that, you'd need to add a seventh char.
They aren't random sequences, though. If you make random ones, just pick a random char of the 48, 6 times. You'll need to check your existing DB for "used" sequences, though, as you'll be more likely to get collisions.
I would use an autonumber system, and create an algorithm to generate the keys. ie 1 = a, 2 = b, 27 = aa etc.
You can use the database autonumber to guarantee that your URL is unique, and you can calculate the URL possibly in a sproc in the DB or in your business layer?
Additionally you can now index on the incrementing number which is cheap and DB's are optimised for these to be used and hashed as primary/foreign keys as opposed to a variable length random string.
The usefulness of a random generator is limited to preventing users from plugging random URLs in to find things they shouldn't have a link to. If this is not your goal then sequential IDs should work just fine. If you just don't want to give users the impression that they are using "infant" technology (when they see that their job ad is #000001), why not start the sequence at some arbitrary value?
When you state "total number of unique 6 char combinations will be more than enough for a long time" for your random generation have you factored the birthday paradox into your calculations? This is generally the bane of any attempt to create random IDs within a range that is only 1 order of magnitude or less then the expected range that will be needed.
To create truly random IDs, you would need to create a loop that generates a new random value, checks to see if that value has already been used, and then repeats the loop if needed. The birthday paradox means that you quickly get to the point where many of the values generated are already in use (despite only a fraction of the total range being consumed), which causes the program to get slower and slower over time until it is taking thousands of attempts (and database lookups) to generate each ID.
I would suggest you go with the idea of encoding sequential IDs. To avoid the problem of users being able to simply increment/decrement the value in the URL to "explore", you can use a combination bit shifting and an alternate ordered list of letters (instead of 1=a, 2=b use 1=t, 2=j, etc).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With