Colon IS an invalid character in URL unless it is used for its purpose (for eg http://). "...Only alphanumerics [0-9a-zA-Z], the special characters "$-_. +! *'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
Yes, semicolons are valid in URLs. However, if you're plucking them from relatively unstructured prose, it's probably safe to assume a semicolon at the end of a URL is meant as sentence punctuation. The same goes for other sentence-punctuation characters like periods, question marks, quotes, etc..
It's just a separator. It doesn't 'mean' or 'specify' anything. In your own example it is also used to separate the scheme from the hostname.
I recently wrote a URL encoder, so this is pretty fresh in my mind.
http://site/gwturl#user:45/comments
All the characters in the fragment part (user:45/comments
) are perfectly legal for RFC 3986 URIs.
The relevant parts of the ABNF:
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
Apart from these restrictions, the fragment part has no defined structure beyond the one your application gives it. The scheme, http, only says that you don't send this part to the server.
EDIT:
D'oh!
Despite my assertions about the URI spec, irreputable provides the correct answer when he points out that the HTML 4 spec restricts element names/identifiers.
Note that identifier rules are changing in HTML 5. URI restrictions will still apply (at time of writing, there are some unresolved issues around HTML 5's use of URIs).
MediaWiki and other wiki engines use colons in their URLs to designate namespaces, with apparently no major problems.
eg http://en.wikipedia.org/wiki/Template:Welcome
In addition to McDowell's analysis on URI standard, remember also that the fragment must be valid HTML anchor name. According to http://www.w3.org/TR/html4/types.html#type-name
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
So you are in luck. ":" is explicitly allowed. And nobody should "%"-escape it, not only because "%" is illegal char there, but also because fragment must match anchor name char-by-char, therefore no agent should try to tamper with them in any way.
However you have to test it. Web standards are not strictly followed, sometimes the standards are conflicting. For example HTTP/1.1 RFC 2616 does not allow query string in the request URL, while HTML constructs one when submitting a form with GET method. Whichever implemented in the real world wins at the end of the day.
I wouldn't count on it. It'll likely get url encoded as %3A
by many user-agents.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With