Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

URL valid characters. java to validate

Tags:

java

url

a string like: 'www.test.com' is good. a string like: 'www.888.com' is good. a string like: 'stackoverflow.com' is good. a string like: 'GOoGle.Com' is good.

why ? because those are valid urls. it does not necessarely matter if they have been registered or not.

now bad strings are:

'goog*d\x' 'manydots...com'

why because you can't register those urls.

if I have a string in java which is supposed to be a good url what's the best way to validate it ?

thanks a lot

like image 284
Chez Avatar asked Apr 08 '10 16:04

Chez


People also ask

How do you check a URL is valid or not in Java?

We can use java. net. url class to validate a URL. The idea is to create a URL object from the specified String representation.

What characters are valid in a URL?

A URL is composed from a limited set of characters belonging to the US-ASCII character set. These characters include digits (0-9), letters(A-Z, a-z), and a few special characters ( "-" , "." , "_" , "~" ).

How do you check if a URL is valid?

You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.

What characters are invalid in a URL?

These characters are { , } , | , \ , ^ , ~ , [ , ] , and ` . All unsafe characters must always be encoded within a URL.


2 Answers

use UrlValidator from the Apache Commons library. Binary package: http://www.mirrorservice.org/sites/ftp.apache.org/commons/validator/binaries/commons-validator-1.3.1.zip (zip contains .jar files)

Example of usage (Construct a UrlValidator with valid schemes of "http", and "https"):

String[] schemes = {"http","https"}.
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("ftp://foo.bar.com/")) {
   System.out.println("url is valid");
} else {
   System.out.println("url is invalid");
}

prints "url is invalid"

If instead the default constructor is used.

UrlValidator urlValidator = new UrlValidator();
if (urlValidator.isValid("ftp://foo.bar.com/")) {
   System.out.println("url is valid");
} else {
   System.out.println("url is invalid");
}

prints out "url is valid"

like image 51
Chris Dennett Avatar answered Oct 02 '22 10:10

Chris Dennett


Those examples are hostnames. They're not valid URLs in themselves.

Hostnames are made of .-separated ‘labels’. Each label must be up to 63 characters of letters, digits and hyphens, but a hyphen must not be the first or last character. It is optional to follow the whole hostname with another dot.

You can match this with a pattern like (assuming case-insensitive):

([a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])(\.[a-z0-9]|[a-z0-9][a-z0-9\-]{0,61}[a-z0-9])*\.?

However this matches strings like 1.2.3.4 as well, which although they technically could be host/domain names will actually act as direct IP addresses. You may want to allow that. If you do, you may also want to allow IPv6 addresses, which are colon-separated hex; when embedded in a URL, they also have square brackets around them.

And then of course there's IDNA. Nowadays, 例え.テスト is a valid IDNA domain name, corresponding to xn--r8jz45g.xn--zckzah. If you want to allow those you'll need some Unicode support.

Summary: it's quite a bit more difficult than you might think. And that's just hostnames. ‘Validating’ a whole URL is even more work. A simple regex isn't going to hack it. Use a pre-existing library.

like image 45
bobince Avatar answered Oct 02 '22 10:10

bobince