Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate a domain name using Regex & Php?

Tags:

I want a solution to validate only domain names not full urls, The following example is what i'm looking for:

domain.com -> true
domain.net -> true
domain.org -> true
domain.biz -> true
domain.co.uk -> true
sub.domain.com -> true
domain.com/folder -> false
domµ*$ain.com -> false
like image 508
CodeOverload Avatar asked Jun 12 '10 00:06

CodeOverload


People also ask

How do I check if a domain name is valid?

If you want to find out if a domain name is validated, simply type the URL into the WHOIS database. The search results will also provide you with other crucial information such as who owns it, when it was registered and when it is due to expire.

How do I validate a pattern in RegEx?

To validate a RegExp just run it against null (no need to know the data you want to test against upfront). If it returns explicit false ( === false ), it's broken. Otherwise it's valid though it need not match anything.

What is RegEx in validation?

What is RegEx Validation (Regular Expression)? RegEx validation is essentially a syntax check which makes it possible to see whether an email address is spelled correctly, has no spaces, commas, and all the @s, dots and domain extensions are in the right place.


1 Answers

The accepted answer is incomplete/wrong.

The regex pattern;

  • should NOT validate domains such as:
    -domain.com, domain--.com, -domain-.-.com, domain.000, etc...

  • should validate domains such as:
    schools.k12, newTLD.clothing, good.photography, etc...

After some further research; below is the most correct, cross-language and compact pattern I could come up with:

^(?!\-)(?:(?:[a-zA-Z\d][a-zA-Z\d\-]{0,61})?[a-zA-Z\d]\.){1,126}(?!\d+)[a-zA-Z\d]{1,63}$

This pattern conforms with most* of the rules defined in the specs:

  • Each label/level (splitted by a dot) may contain up to 63 characters.
  • The full domain name may have up to 127 levels.
  • The full domain name may not exceed the length of 253 characters in its textual representation.
  • Each label can consist of letters, digits and hyphens.
  • Labels cannot start or end with a hyphen.
  • The top-level domain (extension) cannot be all-numeric.

Note 1: The full domain length check is not included in the regex. It should be simply checked by native methods e.g. strlen(domain) <= 253.
Note 2: This pattern works with most languages including PHP, Javascript, Python, etc...

See DEMO here (for JS, PHP, Python)

More Info:

  • The regex above does not support IDNs.

  • There is no spec that says the extension (TLD) should be between 2 and 6 characters. It actually supports 63 characters. See the current TLD list here. Also, some networks do internally use custom/pseudo TLDs.

  • Registration authorities might impose some extra, specific rules which are not explicitly supported in this regex. For example, .CO.UK and .ORG.UK must have at least 3 characters, but less than 23, not including the extension. These kinds of rules are non-standard and subject to change. Do not implement them if you cannot maintain.

  • Regular Expressions are great but not the best effective, performant solution to every problem. So a native URL parser should be used instead, whenever possible. e.g. Python's urlparse() method or PHP's parse_url() method...

  • After all, this is just a format validation. A regex test does not confirm that a domain name is actually configured/exists! You should test the existence by making a request.

Specs & References:

  • IETF: RFC1035
  • IETF: RFC1123
  • IETF: RFC2181
  • IETF: RFC952
  • Wikipedia: Domain Name System

UPDATE (2019-12-21): Fixed leading hyphen with subdomains.

like image 76
Onur Yıldırım Avatar answered Oct 03 '22 19:10

Onur Yıldırım