Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby on Rails - generating bit.ly style identifiers

Tags:

url

uuid

ruby

short

I'm trying to generate UUIDs with the same style as bit.ly urls like:

http://bit [dot] ly/aUekJP

or cloudapp ones:

http://cl [dot] ly/1hVU

which are even smaller

how can I do it? I'm now using UUID gem for ruby but I'm not sure if it's possible to limitate the length and get something like this. I am currently using this:

UUID.generate.split("-")[0] => b9386070

But I would like to have even smaller and knowing that it will be unique.

Any help would be pretty much appreciated :)


edit note: replaced dot letters with [dot] for workaround of banned short link

like image 998
zanona Avatar asked Jul 19 '10 14:07

zanona


2 Answers

You are confusing two different things here. A UUID is a universally unique identifier. It has a very high probability of being unique even if millions of them were being created all over the world at the same time. It is generally displayed as a 36 digit string. You can not chop off the first 8 characters and expect it to be unique.

Bitly, tinyurl et-al store links and generate a short code to represent that link. They do not reconstruct the URL from the code they look it up in a data-store and return the corresponding URL. These are not UUIDS.

Without knowing your application it is hard to advise on what method you should use, however you could store whatever you are pointing at in a data-store with a numeric key and then rebase the key to base32 using the 10 digits and 22 lowercase letters, perhaps avoiding the obvious typo problems like 'o' 'i' 'l' etc

EDIT

On further investigation there is a Ruby base32 gem available that implements Douglas Crockford's Base 32 implementation

A 5 character Base32 string can represent over 33 million integers and a 6 digit string over a billion.

like image 96
Steve Weet Avatar answered Nov 22 '22 15:11

Steve Weet


If you are working with numbers, you can use the built in ruby methods

6175601989.to_s(30)
 => "8e45ttj" 

to go back

"8e45ttj".to_i(30)
=>6175601989

So you don't have to store anything, you can always decode an incoming short_code.

This works ok for proof of concept, but you aren't able to avoid ambiguous characters like: 1lji0o. If you are just looking to use the code to obfuscate database record IDs, this will work fine. In general, short codes are supposed to be easy to remember and transfer from one medium to another, like reading it on someone's presentation slide, or hearing it over the phone. If you need to avoid characters that are hard to read or hard to 'hear', you might need to switch to a process where you generate an acceptable code, and store it.

like image 23
J_McCaffrey Avatar answered Nov 22 '22 16:11

J_McCaffrey