I am using a regular expression to convert plain text URL to clickable links.
@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.-]*(\?\S+)?)?)?)@
However, sometimes in the body of the text, URL are enumerated one per line with a semi-colon at the end. The real URL does not contain any ";".
http://www.aaa.org/pressdetail.asp?PRESS_REL_ID=275; http://www.aaa.org/pressdetail.asp?PRESS_REL_ID=123; http://www.aaa.org/pressdetail.asp?PRESS_REL_ID=124
Is it permitted to have a semicolon (;) in a URL or can the semicolon be considered a marker of the end of an URL? How would that fit in my regular expression?
Technically, a semicolon is a legal sub-delimiter in a URL string; plenty of source material is quoted above including http://www.ietf.org/rfc/rfc3986.txt.
Colon IS an invalid character in URL unless it is used for its purpose (for eg http://). "...Only alphanumerics [0-9a-zA-Z], the special characters "$-_. +! *'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
In URLs, you escape by %XX where XX is the hex code of the character you want. You can get the correctly escaped string easily in javascript by using the escape() or encodeURIComponent() functions.
Semicolon is replaced by %3b in links.
A semicolon is reserved and should only for its special purpose (which depends on the scheme).
Section 2.2:
Many URL schemes reserve certain characters for a special meaning: their appearance in the scheme-specific part of the URL has a designated semantics. If the character corresponding to an octet is reserved in a scheme, the octet must be encoded. The characters ";", "/", "?", ":", "@", "=" and "&" are the characters which may be reserved for special meaning within a scheme. No other characters may be reserved within a scheme.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With