Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a URL contain a semicolon and still be valid?

I am using a regular expression to convert plain text URL to clickable links.

@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@

However, sometimes in the body of the text, URL are enumerated one per line with a semi-colon at the end. The real URL does not contain any ";".

http://www.aaa.org/pressdetail.asp?PRESS_REL_ID=275; http://www.aaa.org/pressdetail.asp?PRESS_REL_ID=123; http://www.aaa.org/pressdetail.asp?PRESS_REL_ID=124 

Is it permitted to have a semicolon (;) in a URL or can the semicolon be considered a marker of the end of an URL? How would that fit in my regular expression?

like image 530
Vincent Avatar asked Jul 24 '09 14:07

Vincent


People also ask

Are semicolons valid in URLs?

Technically, a semicolon is a legal sub-delimiter in a URL string; plenty of source material is quoted above including http://www.ietf.org/rfc/rfc3986.txt.

Can a URL contain colon?

Colon IS an invalid character in URL unless it is used for its purpose (for eg http://). "...Only alphanumerics [0-9a-zA-Z], the special characters "$-_. +! *'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

How do you exit a semicolon in a URL?

In URLs, you escape by %XX where XX is the hex code of the character you want. You can get the correctly escaped string easily in javascript by using the escape() or encodeURIComponent() functions.

Which character replaces a semicolon in any portion of the URL string?

Semicolon is replaced by %3b in links.


1 Answers

A semicolon is reserved and should only for its special purpose (which depends on the scheme).

Section 2.2:

Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme.

like image 62
Greg Avatar answered Oct 05 '22 03:10

Greg