Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Valid characters for directory part of a URL (for short links)

Tags:

http

Are there any other characters except A-Za-z0-9 that can be used to shorten links without getting into trouble? :)

I was thinking about +,;- or something.

Is there a defined standard regarding what characters can be used in a URL that browser vendors respect?

like image 237
Florian Fida Avatar asked Jan 12 '11 14:01

Florian Fida


People also ask

What characters are invalid in a URL?

These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`". All unsafe characters must always be encoded within a URL.

Which characters should be encoded in URL?

Special characters needing encoding are: ':' , '/' , '?' , '#' , '[' , ']' , '@' , '!' , '$' , '&' , "'" , '(' , ')' , '*' , '+' , ',' , ';' , '=' , as well as '%' itself.

How do I make a URL safe?

Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.


1 Answers

A path segment (the parts in a path separated by /) in an absolute URI path can contain zero or more of pchar that is defined as follows:

  pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"   pct-encoded = "%" HEXDIG HEXDIG   unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"   sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"               / "*" / "+" / "," / ";" / "=" 

So it’s basically AZ, az, 09, -, ., _, ~, !, $, &, ', (, ), *, +, ,, ;, =, :, @, as well as % that must be followed by two hexadecimal digits. Any other character/byte needs to be encoded using the percent-encoding.

Although these are 79 characters in total that can be used in a path segment literally, some user agents do encode some of these characters as well (e.g. %7E instead of ~). That’s why many use just the 62 alphanumeric characters (i.e. AZ, az, 09) or the Base 64 Encoding with URL and Filename Safe Alphabet (i.e. AZ, az, 09, -, _).

like image 149
Gumbo Avatar answered Oct 28 '22 09:10

Gumbo