Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a regular expression which will match a valid domain name without a subdomain?

People also ask

How do I find the regex for a domain name?

[A-Za-z0-9-]{1, 63} represents the domain name should be a-z or A-Z or 0-9 and hyphen (-) between 1 and 63 characters long. (? <!

How do I check if a domain name is valid?

If you want to find out if a domain name is validated, simply type the URL into the WHOIS database. The search results will also provide you with other crucial information such as who owns it, when it was registered and when it is due to expire.


I know that this is a bit of an old post, but all of the regular expressions here are missing one very important component: the support for IDN domain names.

IDN domain names start with xn--. They enable extended UTF-8 characters in domain names. For example, did you know "♡.com" is a valid domain name? Yeah, "love heart dot com"! To validate the domain name, you need to let http://xn--c6h.com/ pass the validation.

Note, to use this regex, you will need to convert the domain to lower case, and also use an IDN library to ensure you encode domain names to ACE (also known as "ASCII Compatible Encoding"). One good library is GNU-Libidn.

idn(1) is the command line interface to the internationalized domain name library. The following example converts the host name in UTF-8 into ACE encoding. The resulting URL https://nic.xn--flw351e/ can then be used as ACE-encoded equivalent of https://nic.谷歌/.

  $ idn --quiet -a nic.谷歌
  nic.xn--flw351e

This magic regular expression should cover most domains (although, I am sure there are many valid edge cases that I have missed):

^((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})$

When choosing a domain validation regex, you should see if the domain matches the following:

  1. xn--stackoverflow.com
  2. stackoverflow.xn--com
  3. stackoverflow.co.uk

If these three domains do not pass, your regular expression may be not allowing legitimate domains!

Check out The Internationalized Domain Names Support page from Oracle's International Language Environment Guide for more information.

Feel free to try out the regex here: http://www.regexr.com/3abjr

ICANN keeps a list of tlds that have been delegated which can be used to see some examples of IDN domains.


Edit:

 ^(((?!-))(xn--|_{1,1})?[a-z0-9-]{0,61}[a-z0-9]{1,1}\.)*(xn--)?([a-z0-9][a-z0-9\-]{0,60}|[a-z0-9-]{1,30}\.[a-z]{2,})$

This regular expression will stop domains that have '-' at the end of a hostname as being marked as being valid. Additionally, it allows unlimited subdomains.


Well, it's pretty straightforward a little sneakier than it looks (see comments), given your specific requirements:

/^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}$/

But note this will reject a lot of valid domains.


My RegEx is next:

^[a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9]{0,1}\.([a-zA-Z]{1,6}|[a-zA-Z0-9-]{1,30}\.[a-zA-Z]{2,3})$

it's ok for i.oh1.me and for wow.british-library.uk

UPD

Here is updated rule

^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$

Regular expression visualization

https://www.debuggex.com/r/y4Xe_hDVO11bv1DV

now it check for - or _ in the start or end of domain label.


My bet:

^(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$

Explained:

Domain name is built from segments. Here is one segment (except final):

[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?

It can have 1-63 characters, does not start or end with '-'.

Now append '.' to it and repeat at least one time:

(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+

Then attach final segment, which is 2-63 characters long:

[a-z0-9][a-z0-9-]{0,61}[a-z0-9]

Test it here: http://regexr.com/3au3g


This answer is for domain names (including service RRs), not host names (like an email hostname).

^(?=.{1,253}\.?$)(?:(?!-|[^.]+_)[A-Za-z0-9-_]{1,63}(?<!-)(?:\.|$)){2,}$

It is basically mkyong's answer and additionally:

  • Max length of 255 octets including length prefixes and null root.
  • Allow trailing '.' for explicit dns root.
  • Allow leading '_' for service domain RRs, (bugs: doesn't enforce 15 char max for _ labels, nor does it require at least one domain above service RRs)
  • Matches all possible TLDs.
  • Doesn't capture subdomain labels.

By Parts

Lookahead, limit max length between ^$ to 253 characters with optional trailing literal '.'

(?=.{1,253}\.?$)

Lookahead, next character is not a '-' and no '_' follows any characters before the next '.'. That is to say, enforce that the first character of a label isn't a '-' and only the first character may be a '_'.

(?!-|[^.]+_)

Between 1 and 63 of the allowed characters per label.

[A-Za-z0-9-_]{1,63}

Lookbehind, previous character not '-'. That is to say, enforce that the last character of a label isn't a '-'.

(?<!-)

Force a '.' at the end of every label except the last, where it is optional.

(?:\.|$)

Mostly combined from above, this requires at least two domain levels, which is not quite correct, but usually a reasonable assumption. Change from {2,} to + if you want to allow TLDs or unqualified relative subdomains through (eg, localhost, myrouter, to.)

(?:(?!-|[^.]+_)[A-Za-z0-9-_]{1,63}(?<!-)(?:\.|$)){2,}

Unit tests for this expression.


Accepted answer not working for me, try this :

^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,6}$

Visit this Unit Test Cases for validation.