Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Typical URL lengths for storage calculation purposes (URL-shortener)

After reading several of the hits on a quick google search, it seems there is not a whole lot of consistency when it comes to determining average URL length.

I know IE has a maximum URL length of 2083 characters (from here) - so I have a good maximum to work with.

My concern is that I am writing a URL-shortener in PHP (similar to some other questions on SO), and want to make sure I am not likely to exceed the storage capability of the server hosting it.

If all URLs are the IE maximum, then 2^32 won't fit comfortably anywhere - it'd take 2K x 4B ~= 8TB of storage: an unrealistic expectation.

Without adding-in a trimming function (ie, purging "old" shortened URLs), what is the safest way to calculate storage usage of the app?

Is ~34 characters a safe guess? If so, then a fully-populated (using an int type for a primary key) database would chew 292GB of space (double 146GB for any meta data that may want to be stored).

What is the best-guess for an application such as this?

like image 899
warren Avatar asked May 29 '11 16:05

warren


People also ask

What is the average URL length?

But once you move down to the number 10 spot, the average URL has 62 characters. So somewhere around 50 – 60 characters is a pretty good number to shoot for. If you go way beyond (say 80+ characters), this is likely to have a negative impact on your ranking.

How many bytes is a URL?

It is generally accepted that URL lengths should not exceed 255 bytes, as older implementations may not be able to accept longer lengths.


3 Answers

This is probably unknowable without indexing the entire Internet, but according to an analysis by Kelvin Tan on a dataset of 6,627,999 unique URLs from 78,764 unique domains, the answer is 76.97:

Mean: 76.97

Standard Deviation: 37.41

95th% confidence interval: 157

99.5th% confidence interval: 218

like image 103
Hugh Guiney Avatar answered Oct 24 '22 18:10

Hugh Guiney


I'm not sure what is typical, but of 11,000 urls in our request database, the average length is 62 characters. There are hundreds of urls with several hundred characters. The longest is a Google Translate link at 1689 characters.

top 10 len(producturl):
1689
792
707
693
647
606
574
569
562
560

sample url 647 characters:

http://www.amazon.co.jp/%E9%AD%94%E7%95%8C%E6%88%A6%E8%A8%98%E3%83%87%E3%82%A3%E3%82%B9%E3%82%AC%E3%82%A4%E3%82%A24-%E5%88%9D%E5%9B%9E%E9%99%90%E5%AE%9A%E7%89%88-%E5%A0%95%E5%A4%A9%E4%BD%BF%E3%83%95%E3%83%AD%E3%83%B3-%E3%83%97%E3%83%AD%E3%83%80%E3%82%AF%E3%83%88%E3%82%B3%E3%83%BC%E3%83%89%E4%BB%98%E3%81%8D%E7%89%B9%E8%A3%BD%E3%82%AB%E3%83%BC%E3%83%89-%E3%83%88%E3%83%AC%E3%83%BC%E3%83%87%E3%82%A3%E3%83%B3%E3%82%B0%E3%82%AB%E3%83%BC%E3%83%89%E3%80%8C%E3%83%B4%E3%82%A1%E3%82%A4%E3%82%B9%E3%82%B7%E3%83%A5%E3%83%B4%E3%82%A1%E3%83%AB%E3%83%84%E3%80%8D%E9%99%90%E5%AE%9APR%E3%82%AB%E3%83%BC%E3%83%89%E4%BB%98%E3%81%8D/dp/B0043RT8UO/ref=pd_rhf_p_t_1

P.S. for estimating purposes you should extrapolate from some dataset after applying standard deviation to throw out the outliers which could distort your mean.

like image 5
Max Hodges Avatar answered Oct 24 '22 18:10

Max Hodges


From RFC 2068 section 3.2.1:

The HTTP protocol does not place any a priori limit on the length of a URI. Servers MUST be able to handle the URI of any resource they serve, and SHOULD be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs. A server SHOULD return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).

Note: Servers should be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations may not properly support these lengths.

Although IE (and probably most other browsers) support much longer URI lengths, I don't believe most forms or client-side apps rely on anything above 255 bytes working. Your server logs should provide some statistics about what kind of urls you are seeing.

like image 2
Ted Hopp Avatar answered Oct 24 '22 17:10

Ted Hopp