Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alter regex to allow IP address when checking URL?

Tags:

regex

url

I have the following regex to check to see if a URL is valid:

preg_match('/^(http(s?):\/\/)?(www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$/i', $url);

I like to modify this part [a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3}) (at least I hope it is this bold part) to be either an IP address or this highlighted part.

At the moment, the regex is pretty good for me as it finds the bad URLs correctly - though I believe this will start failing to work correctly once the new domain policy from ICANN goes live (ie. Google may want to have the url http://search.google - instead of http://google.com for search)

Anyhow, I'd like to add the ability to allow IP addresses to also be a valid URL, but I'm unsure how to factor that into the regex

If anyone could lend a hand, then that would be great!

like image 631
MrJ Avatar asked Nov 23 '11 21:11

MrJ


People also ask

What would be the regex to validate the IP address?

// Regex for digit from 0 to 255. // followed by a dot, repeat 4 times. // this is the regex to validate an IP address. = zeroTo255 + "\\."

How do you validate a URL?

You can use the URLConstructor to check if a string is a valid URL. URLConstructor ( new URL(url) ) returns a newly created URL object defined by the URL parameters. A JavaScript TypeError exception is thrown if the given URL is not valid.

How do I find the regex for a URL?

Match the given URL with the regular expression. In Java, this can be done by using Pattern. matcher(). Return true if the URL matches with the given regular expression, else return false.


2 Answers

This regex seems to work:

^(http(s?):\/\/)?(((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$

At the section after the check for "http", it simply performs an OR operation, to match either a domain name, or IP. Here is the relevant excerpt:

((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)

The IP expression is somewhat long, but it makes sure that it is a valid IP (as in, not 999.999.999.999). You can easily substitute it for another IP check.

Here it is incorporated into your earlier code:

preg_match('/^(http(s?):\/\/)?(((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,3})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$/i', $url);
like image 115
voithos Avatar answered Sep 30 '22 01:09

voithos


Two points. Top level domains now seem to max out at 6 characters (museum) so we need to account for that:

^(http(s?):\/\/)?(((www\.)?+[a-zA-Z0-9\.\-\_]+(\.[a-zA-Z]{2,6})+)|(\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b))(\/[a-zA-Z0-9\_\-\s\.\/\?\%\#\&\=]*)?$

In C based languages we need to escape those \

char *regex = "/^(http(s?):\\/\\/)?(www\\.)?+[a-zA-Z0-9\\.\\-\\_]+(\\.[a-zA-Z]{2,6})+(\\/[a-zA-Z0-9\\_\\-\\s\\.\\/\\?\\%\\#\\&\\=]*)?$/i";

In objective C we can define a category Method on NSString:

- (BOOL)isURL
{
    // uses ICU regex syntax http://userguide.icu-project.org/strings/regexp
    NSString *regex = @"^(http(s?)://)?(www\\.)?+[a-zA-Z0-9\\.\\-_]+(\\.[a-zA-Z]{2,6})+(/[a-zA-Z0-9_\\-\\s\\./\\?%#\\&=]*)?$";

    NSPredicate *regextest = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", regex];
    return [regextest evaluateWithObject:self];
}

Note that this solution completely ignores IPv6!

like image 26
Jonathan Mitchell Avatar answered Sep 29 '22 23:09

Jonathan Mitchell