I have the following regex to check to see if a URL is valid:
preg_match('/^(http(s?):\/\/)?(www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$/i', $url);
I like to modify this part [a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})
(at least I hope it is this bold part) to be either an IP address or this highlighted part.
At the moment, the regex is pretty good for me as it finds the bad URLs correctly - though I believe this will start failing to work correctly once the new domain policy from ICANN goes live (ie. Google may want to have the url http://search.google - instead of http://google.com for search)
Anyhow, I'd like to add the ability to allow IP addresses to also be a valid URL, but I'm unsure how to factor that into the regex
If anyone could lend a hand, then that would be great!
// Regex for digit from 0 to 255. // followed by a dot, repeat 4 times. // this is the regex to validate an IP address. = zeroTo255 + "\\."
You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.
Match the given URL with the regular expression. In Java, this can be done by using Pattern. matcher(). Return true if the URL matches with the given regular expression, else return false.
This regex seems to work:
^(http(s?):\/\/)?(((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$
At the section after the check for "http", it simply performs an OR operation, to match either a domain name, or IP. Here is the relevant excerpt:
((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)
The IP expression is somewhat long, but it makes sure that it is a valid IP (as in, not 999.999.999.999
). You can easily substitute it for another IP check.
Here it is incorporated into your earlier code:
preg_match('/^(http(s?):\/\/)?(((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$/i', $url);
Two points. Top level domains now seem to max out at 6 characters (museum) so we need to account for that:
^(http(s?):\/\/)?(((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,6})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$
In C based languages we need to escape those \
char *regex = "/^(http(s?):\\/\\/)?(www\\.)?+[a-zA-Z0-9\\.\\-\\_]+(\\.[a-zA-Z]{2,6})+(\\/[a-zA-Z0-9\\_\\-\\s\\.\\/\\?\\%\\#\\&\\=]*)?$/i";
In objective C we can define a category Method on NSString:
- (BOOL)isURL
{
// uses ICU regex syntax http://userguide.icu-project.org/strings/regexp
NSString *regex = @"^(http(s?)://)?(www\\.)?+[a-zA-Z0-9\\.\\-_]+(\\.[a-zA-Z]{2,6})+(/[a-zA-Z0-9_\\-\\s\\./\\?%#\\&=]*)?$";
NSPredicate *regextest = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", regex];
return [regextest evaluateWithObject:self];
}
Note that this solution completely ignores IPv6!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With