Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the semicolon reserved for in URLs?

The RFC 3986 URI: Generic Syntax specification lists a semicolon as a reserved (sub-delim) character:

reserved    = gen-delims / sub-delims  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"               / "*" / "+" / "," / ";" / "=" 

What is the reserved purpose of the ";" of the semicolon in URIs? For that matter, what is the purpose of the other sub-delims (I'm only aware of purposes for "&", "+", and "=")?

like image 276
Nicole Avatar asked Jan 29 '10 17:01

Nicole


People also ask

What does a semicolon do in a URL?

In the real world however, the primary use for semicolons in URLs is to hide a virus or phishing URL behind a legitimate URL.

What colon means in URL?

It's just a separator. It doesn't 'mean' or 'specify' anything. In your own example it is also used to separate the scheme from the hostname.

How do you exit a semicolon in a URL?

In URLs, you escape by %XX where XX is the hex code of the character you want. You can get the correctly escaped string easily in javascript by using the escape() or encodeURIComponent() functions.

Is semicolon mandatory in CSS?

No, semicolons are only required to separate rules in CSS blocks. Semicolons are delimiters, not terminators.


2 Answers

There is an explanation at the end of section 3.3.

Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used forsimilar purposes. For example, one URI producer might use a segment uch as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme-specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm.

In other words, it is reserved so that people who want a delimited list of something in the URL can safely use ; as a delimiter even if the parts contain ;, as long as the contents are percent-encoded. In other words, you can do this:

foo;bar;baz%3bqux 

and interpret it as three parts: foo, bar, baz;qux. If semicolon were not a reserved character, the ; and %3bwould be equivalent, so the URI would be incorrectly interpreted as four parts: foo, bar, baz, qux.

like image 161
Mark Byers Avatar answered Oct 18 '22 11:10

Mark Byers


The intent is clearer if you go back to older versions of the specification:

  path_segments = segment *( "/" segment )   segment       = *pchar *( ";" param )  

Each path segment may include a sequence of parameters, indicated by the semicolon ";" character.

I believe it has its origins in FTP URIs.

like image 26
McDowell Avatar answered Oct 18 '22 09:10

McDowell