Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Semicolon as URL query separator

Although it is strongly recommended (W3C source, via Wikipedia) for web servers to support semicolon as a separator of URL query items (in addition to ampersand), it does not seem to be generally followed.

For example, compare

        http://www.google.com/search?q=nemo&oe=utf-8

        http://www.google.com/search?q=nemo;oe=utf-8

results. (In the latter case, semicolon is, or was at the time of writing this text, treated as ordinary string character, as if the url was: http://www.google.com/search?q=nemo%3Boe=utf-8)

Although the first URL parsing library i tried, behaves well:

>>> from urlparse import urlparse, query_qs
>>> url = 'http://www.google.com/search?q=nemo;oe=utf-8'
>>> parse_qs(urlparse(url).query)
{'q': ['nemo'], 'oe': ['utf-8']}

What is the current status of accepting semicolon as a separator, and what are potential issues or some interesting notes? (from both server and client point of view)

like image 508
mykhal Avatar asked Aug 14 '10 01:08

mykhal


People also ask

Can I have a semicolon in a URL?

Yes, semicolons are valid in URLs. However, if you're plucking them from relatively unstructured prose, it's probably safe to assume a semicolon at the end of a URL is meant as sentence punctuation. The same goes for other sentence-punctuation characters like periods, question marks, quotes, etc..

What is a query URL?

On the internet, a Query string is the part of a link (otherwise known as a hyperlink or a uniform resource locator, URL for short) which assigns values to specified attributes (known as keys or parameters). Typical link containing a query string is as follows: http://example.com/over/there?


2 Answers

The W3C Recommendation from 1999 is obsolete. The current status, according to the 2014 W3C Recommendation, is that semicolon is now illegal as a parameter separator:

To decode application/x-www-form-urlencoded payloads, the following algorithm should be used. [...] The output of this algorithm is a sorted list of name-value pairs. [...]

  1. Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).

In other words, ?foo=bar;baz means the parameter foo will have the value bar;baz; whereas ?foo=bar;baz=sna should result in foo being bar;baz=sna (although technically illegal since the second = should be escaped to %3D).

like image 61
geira Avatar answered Oct 08 '22 08:10

geira


As long as your HTTP server, and your server-side application, accept semicolons as separators, you should be good to go. I cannot see any drawbacks. As you said, the W3C spec is on your side:

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.

like image 22
Daniel Vassallo Avatar answered Oct 08 '22 07:10

Daniel Vassallo