I am using the following regex to match a URL:
$search = "/([\S]+\.(MUSEUM|TRAVEL|AERO|ARPA|ASIA|COOP|INFO|NAME|BIZ|CAT|COM|INT|JOBS|NET|ORG|PRO|TEL|AC|AD|AE|AF|AG|AI|AL|AM|AN|AO|AQ|AR|AS|AT|AU|au|AW|AX|AZ|BA|BB|BD|BE|BF|BG|BH|BI|BJ|BL|BM|BN|BO|BR|BS|BT|BV|BW|BY|BZ|CA|CC|CD|CF|CG|CH|CI|CK|CL|CM|CN|CO|CR|CU|CV|CX|CY|CZ|DE|DJ|DK|DM|DO|DZ|EC|EDU|EE|EG|EH|ER|ES|ET|EU|FI|FJ|FK|FM|FO|FR|GA|GB|GD|GE|GF|GG|GH|GI|GL|GM|GN|GOV|GP|GQ|GR|GS|GT|GU|GW|GY|HK|HM|HN|HR|HT|HU|ID|IE|IL|IM|IN|IO|IQ|IR|IS|IT|JE|JM|JO|JP|KE|KG|KH|KI|KM|KN|KP|KR|KW|KY|KZ|LA|LB|LC|LI|LK|LR|LS|LT|LU|LV|LY|MA|MC|MD|ME|MF|MG|MH|MIL|MK|ML|MM|MN|MO|MOBI|MP|MQ|MR|MS|MT|MU|MV|MW|MX|MY|MZ|NA|NC|NE|NF|NG|NI|NL|NO|NP|NR|NU|NZ|OM|PA|PE|PF|PG|PH|PK|PL|PM|PN|PR|PS|PT|PW|PY|QA|RE|RO|RS|RU|RW|SA|SB|SC|SD|SE|SG|SH|SI|SJ|SK|SL|SM|SN|SO|SR|ST|SU|SV|SY|SZ|TC|TD|TF|TG|TH|TJ|TK|TL|TM|TN|TO|R|H|TP|TR|TT|TV|TW|TZ|UA|UG|UK|UM|US|UY|UZ|VA|VC|VE|VG|VI|VN|VU|WF|WS|YE|YT|YU|ZA|ZM|ZW)([\S]*))/i";
But its a bit screwed up because it also matches "abc.php" which I dont want. and something like abc...test. I want it to match abc.com though. and www.abc.com as well as http://abc.com.
It just needs a slight tweak at the end but I am not sure what. (there should be a slash after the any domain name which it is not checking for right now and it is only checking \S)
thank you for your time.
@:%_\+~#= , to match the domain/sub domain name. In this solution query string parameters are also taken care. If you are not using RegEx , then from the expression replace \\ by \ . Hope this helps.
URL regular expressions can be used to verify if a string has a valid URL format as well as to extract an URL from a string.
URL Regular Expession tutorial. Regular expressions are a combination of characters that are used to define a search pattern.
In Java, this can be done by using Pattern. matcher(). Find the substring from the first index of match result to the last index of the match result and add this substring into the list. After completing the above steps, if the list is found to be empty, then print “-1” as there is no URL present in the string S.
$search = "#^((?# the scheme: )(?:https?://)(?# second level domains and beyond: )(?:[\S]+\.)+((?# top level domains: )MUSEUM|TRAVEL|AERO|ARPA|ASIA|EDU|GOV|MIL|MOBI|(?# )COOP|INFO|NAME|BIZ|CAT|COM|INT|JOBS|NET|ORG|PRO|TEL|(?# )A[CDEFGILMNOQRSTUWXZ]|B[ABDEFGHIJLMNORSTVWYZ]|(?# )C[ACDFGHIKLMNORUVXYZ]|D[EJKMOZ]|(?# )E[CEGHRSTU]|F[IJKMOR]|G[ABDEFGHILMNPQRSTUWY]|(?# )H[KMNRTU]|I[DELMNOQRST]|J[EMOP]|(?# )K[EGHIMNPRWYZ]|L[ABCIKRSTUVY]|M[ACDEFGHKLMNOPQRSTUVWXYZ]|(?# )N[ACEFGILOPRUZ]|OM|P[AEFGHKLMNRSTWY]|QA|R[EOSUW]|(?# )S[ABCDEGHIJKLMNORTUVYZ]|T[CDFGHJKLMNOPRTVWZ]|(?# )U[AGKMSYZ]|V[ACEGINU]|W[FS]|Y[ETU]|Z[AMW])(?# the path, can be there or not: )(/[a-z0-9\._/~%\-\+&\#\?!=\(\)@]*)?)$#i";
Just cleaned up a bit. This will match only HTTP(s) addresses, and, as long as you copied all top level domains correctly from IANA, only those standardized (it will not match http://localhost
) and with the http://
declared.
Finally you should end with the path part, that will always start with a /, if it is there.
However, I'd suggest to follow Cerebrus: If you're not sure about this, learn regexps in a more gentle way and use proven patterns for complicated tasks.
Cheers,
By the way: Your regexp will also match something.r
and something.h
(between |TO| and |TR| in your example). I left them out in my version, as I guess it was a typo.
On re-reading the question: Change
)(?:https?://)(?#
to
)(?:https?://)?(?#
(there is a ?
extra) to match 'URLs' without the scheme.
Not exactly what the OP asked for but this is a much simpler regular expression that does not need to be updated each time the IANA introduces a new TLD. I believe this is more adequate for most simple needs:
^(?:https?://)?(?:[\w]+\.)(?:\.?[\w]{2,})+$
no list of TLD, localhost is not matched, the number of subparts must be >= 2 and the length of each subpart must be >= 2 (fx: "a.a" will not match but "a.ab" will match).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With