Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c# shortening string for url

i want to uniquely shorten strings-file ids to use in urls like the ones on bit.ly etc. I can use ids from a db but i want urls to be random like.

what would be the best solution?

site will be a mobile site so i want to it to as short as possible

like image 687
nLL Avatar asked Jan 12 '10 21:01

nLL


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

What is C in C language?

What is C? C is a general-purpose programming language created by Dennis Ritchie at the Bell Laboratories in 1972. It is a very popular language, despite being old. C is strongly associated with UNIX, as it was developed to write the UNIX operating system.


2 Answers

You can't "uniquely shorten" arbitrary strings. Pigeonhole principle and all.

What you want to do (and, AFAIK what url-shortening services do) is keep a database of everything submitted, and the short string used. Then you can look it up in the database.

You can generate the short strings by simply incrementing a number and Base64 encoding it for each time.

like image 106
Anon. Avatar answered Oct 19 '22 07:10

Anon.


There are two methods to implementing a mapping service like the one you describe.

  1. Clients submit globally unique ids, or
  2. Server generates globally unique ids

Clients submit globally unique ids

As far as I know, 1. should only be attempted with Guids, unless you devise a similar means to cram sufficiently distinct information into a short byte stream. Either way, if you have a stream of bytes that represent a globally unique identifier, you may do something like this

// source is either a Guid, or some other globally unique byte stream
byte[] bytes = Guid.NewGuid ().ToByteArray ();
string base64String = Convert.ToBase64String (bytes).Trim ("=");

to obtain a user-readable string of alphanumerics that appears random, but avoids collisions inherent in other random schemes. A Guid contains 16 bytes, or 128 bits, which translates to approximately 19 characters for a full Base64 encoding.

The advantage to this approach is that clients may generate their own tiny Uris without a central authority. The downside is hefty length if you roll with Guid, or implementing your own globally unique byte stream which - let's face it - is error prone.

If you do go this route, consider Google'ing globally unique byte streams or the such. Oh, and STAY AWAY FROM RANDOM BYTES, otherwise you will have to build collision resolution ON TOP OF your tiny Uri generator.

Server generates globally unique ids

Again, the primary advantage to the above is that Client's may generate their Uris a priori. Particularly handy if you are about to submit a long running request you wish to check up on. This may not be particularly relevant to your situation, and may provide only limited value.

So, that aside, a server-centric approach, in which a single authority generates and doles out ids may be more appealing. If this is the route you choose, then the only question is how long would you like your Uri?

Presuming a desired length of 5 characters, and let's say you go with a Base64 encoding, each id may represent up to 5 characters by 7 bits per character equals 35 bits or 2^35 [34 359 738 368] distinct values. That's a fairly large domain. *

Then it becomes a question of returning a value for a given submission. There are probably a great many many ways to do this, but I would go with something like this,

  • Enumerate all possible values within a "free list" in your database
  • Remove value from free list when consumed
  • Add value to free list when released

Enhancements or optimizations may include

  • Do not enumerate every value on range [0, 2^35], instead enumerate a manageable subset, say 100 000 values at a time, and when all values are consumed, simply generate another 100 000 values in sequence and continue
  • Add an expiry date to values, and recycle expired values end of day
  • Distribute your service, when parallelizing your service simply dole out small mutually exclusive subsets of your free list to distributed services

Conclusion

Bottom line is, you want to guarantee uniqueness - so collisions are a big no-no.


*=34 359 738 368 is the size of the raw domain, this is all ids of 0 length to 5 length. If you are interested in restricting all ids to a minimum and maximum of 5 length, then your domain looks like all ids of length 0 to 5 (2^35) less all ids of length 0 to 4 (2^28) is 2^35 - 2^28 = 34 091 302 912, which is still quite large :)

like image 23
johnny g Avatar answered Oct 19 '22 05:10

johnny g