Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Website/URL Validation Regex in JAVA

I need a regex string to match URL starting with "http://", "https://", "www.", "google.com"

the code i tried using is:

//Pattern to check if this is a valid URL address
    Pattern p = Pattern.compile("(http://|https://)(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?");
    Matcher m;
    m=p.matcher(urlAddress);

but this code only can match url such as "http://www.google.com"

I know this ma be a dupicate question but i have tried all of the regex provided and it does not suit my requirement. Willl someone please help me? Thank you.

like image 895
Hao Ting Avatar asked Jul 24 '14 02:07

Hao Ting


4 Answers

You need to make (http://|https://) part in your regex as optional one.

^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$

DEMO

like image 133
Avinash Raj Avatar answered Sep 30 '22 20:09

Avinash Raj


You can use the Apache commons library(org.apache.commons.validator.UrlValidator) for validating a url:

String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);

And use :-

 urlValidator.isValid(your url)

Then there is no need of regex.

Link:- https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html

like image 24
Raj Hassani Avatar answered Sep 30 '22 20:09

Raj Hassani


If you use Java, I recommend use this RegEx (I wrote it by myself):

^(https?:\/\/)?(www\.)?([\w]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String

to explain:

  • ^ = line start
  • (https?://)? = "http://" or "https://" may occur.
  • (www.)? = "www." may orrur.
  • ([\w]+.)+ = a word ([a-zA-Z0-9]) has to occur one or more times. (extend here if you need special characters like ü, ä, ö or others in your URL - remember to use IDN.toASCII(url) if you use special characters. If you need to know which characters are legal in general: https://kb.ucla.edu/articles/what-characters-can-go-into-a-valid-http-url
  • [‌​\w]{2,63} = a word ([a-zA-Z0-9]) with 2 to 63 characters has to occur exactly one time. (a TLD (top level domain (for example .com) can not be shorter than 2 or longer than 63 characters)
  • /? = a "/"-character may occur. (some people or servers put a / at the end... whatever)
  • $ = line end

-

If you extend it by special characters it could look like this:

^(https?:\/\/)?(www\.)?([\w\Q$-_+!*'(),%\E]+\.)+[‌​\w]{2,63}\/?$
"^(https?:\\/\\/)?(www\.)?([\\w\\Q$-_+!*'(),%\\E]+\\.)+[‌​\\w]{2,63}\\/?$" // as Java-String

The answer of Avinash Raj is not fully correct.

^(http:\/\/|https:\/\/)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$

The dots are not escaped what means it matches with any character. Also my version is simpler and I never heard of a domain like "test..com" (which actually matches...)

Demo: https://regex101.com/r/vM7wT6/279


Edit: As I saw some people needing a regex which also matches servers directories I wrote this:

^(https?:\/\/)?([\w\Q$-_+!*'(),%\E]+\.)+(\w{2,63})(:\d{1,4})?([\w\Q/$-_+!*'(),%\E]+\.?[\w])*\/?$

while this may not be the best one, since I didn't spend too much time with it, maybe it helps someone. You can see how it works here: https://regex101.com/r/vM7wT6/700 It also matches urls like "hello.to/test/whatever.cgi"

like image 32
KnechtRootrecht Avatar answered Sep 30 '22 20:09

KnechtRootrecht


Java compatible version of @Avinash's answer would be

//Pattern to check if this is a valid URL address
Pattern p = Pattern.compile("^(http://|https://)?(www.)?([a-zA-Z0-9]+).[a-zA-Z0-9]*.[a-z]{3}.?([a-z]+)?$");
Matcher m;
m=p.matcher(urlAddress);
boolean matches = m.matches();
like image 24
raghavsood33 Avatar answered Sep 30 '22 19:09

raghavsood33