Although it is strongly recommended (W3C source, via Wikipedia) for web servers to support semicolon as a separator of URL query items (in addition to ampersand), it does not seem to be generally followed.
For example, compare
http://www.google.com/search?q=nemo&oe=utf-8
http://www.google.com/search?q=nemo;oe=utf-8
results. (In the latter case, semicolon is, or was at the time of writing this text, treated as ordinary string character, as if the url was: http://www.google.com/search?q=nemo%3Boe=utf-8)
Although the first URL parsing library i tried, behaves well:
>>> from urlparse import urlparse, query_qs
>>> url = 'http://www.google.com/search?q=nemo;oe=utf-8'
>>> parse_qs(urlparse(url).query)
{'q': ['nemo'], 'oe': ['utf-8']}
What is the current status of accepting semicolon as a separator, and what are potential issues or some interesting notes? (from both server and client point of view)
Yes, semicolons are valid in URLs. However, if you're plucking them from relatively unstructured prose, it's probably safe to assume a semicolon at the end of a URL is meant as sentence punctuation. The same goes for other sentence-punctuation characters like periods, question marks, quotes, etc..
On the internet, a Query string is the part of a link (otherwise known as a hyperlink or a uniform resource locator, URL for short) which assigns values to specified attributes (known as keys or parameters). Typical link containing a query string is as follows: http://example.com/over/there?
The W3C Recommendation from 1999 is obsolete. The current status, according to the 2014 W3C Recommendation, is that semicolon is now illegal as a parameter separator:
To decode application/x-www-form-urlencoded payloads, the following algorithm should be used. [...] The output of this algorithm is a sorted list of name-value pairs. [...]
- Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).
In other words, ?foo=bar;baz
means the parameter foo
will have the value bar;baz
; whereas ?foo=bar;baz=sna
should result in foo
being bar;baz=sna
(although technically illegal since the second =
should be escaped to %3D
).
As long as your HTTP server, and your server-side application, accept semicolons as separators, you should be good to go. I cannot see any drawbacks. As you said, the W3C spec is on your side:
We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With