Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number in the top-level domain?

Can top-level domains contain a number at the end? Idk nothing about DNS rules etc but when I try to use PHP's filter_var() function with FILTER_VALIDATE_EMAIL for test@null.com1 it returns true.

like image 782
JackTheKnife Avatar asked Jan 30 '12 21:01

JackTheKnife


People also ask

What is the top top-level domain?

A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet after the root domain. The top-level domain names are installed in the root zone of the name space.


2 Answers

Does top-level domain can contain a number at the end?

Yes technically, except if it is purely numerical, then it can not be a TLD, under current rules and for easy reasons to understand (to disambiguate with IP addresses). And it can not contain a number at the end, except if it is an IDN TLD, for reasons enforced by ICANN.

Let us go back to some RFCs to have some clearer definitions of things:

RFC 952: DOD INTERNET HOST TABLE SPECIFICATION (October 1985)

This is the definition of an Internet "hostname" back then:

A "name" (Net, Host, Gateway, or Domain name) is a text string up
to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
sign (-), and period (.). Note that periods are only allowed when
they serve to delimit components of "domain style names". (See
RFC-921, "Domain Name System Implementation Schedule", for
background). No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be an alpha character. The last character must not be a minus sign or period.

Note that this also has the following:

Single character names or nicknames are not allowed.

Hence at that point:

  • com1 is a valid TLD
  • 3com is not ("The first character must be an alpha character.")
  • 42 is not (same reason)
  • 1 is not (same reason)
  • a is not ("Single character names or nicknames are not allowed.")

RFC 1034: DOMAIN NAMES - CONCEPTS AND FACILITIES (November 1987)

This is one of the RFC that created the DNS as we know today. For compatibility reasons it defined hostnames as a sequence of labels, where a label is defined as such:

They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

The TLD is one label among others (the L in TLD). Per the above rule, com1 is a valid label, and hence a valid TLD, where 3com would not have been. Which directly brings us to the following amendment.

RFC 1123: Requirements for Internet Hosts -- Application and Support (October 1989)

This amends the previous RFC by changing one rule:

The syntax of a legal Internet host name was specified in RFC-952 [DNS:4]. One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. Host software MUST support this more liberal syntax.

So at that point:

  • com1 is a valid TLD
  • 3com is also valid
  • 42 is valid
  • 1 is valid
  • a is valid

For the case of "numerical" TLDs, the following rule in first document applies:

Whenever a user inputs the identity of an Internet host, it SHOULD be possible to enter either (1) a host domain name or (2) an IP address in dotted-decimal ("#.#.#.#") form. The host SHOULD check the string syntactically for a dotted-decimal number before looking it up in the Domain Name System.

and

If a dotted-decimal number can be entered without such identifying delimiters, then a full syntactic check must be made, because a segment of a host domain name is now allowed to begin with a digit and could legally be entirely numeric (see Section 6.1.2.4). However, a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic.

RFC 1738: Uniform Resource Locators (URL) (December 1994)

This also speaks about the TLD, but giving:

The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.

RFC 3696: Application Techniques for Checking and Transformation of Names (February 2004)

This was needed to introduce IDNs (Internationalized Domain Names) and it has this to say:

Any characters, or combination of bits (as octets), are permitted in DNS names. However, there is a preferred form that is required by most applications. This preferred form has been the only one permitted in the names of top-level domains, or TLDs. In general, it is also the only form permitted in most second-level names registered in TLDs, although some names that are normally not seen by users obey other rules. It derives from the original ARPANET rules for the naming of hosts (i.e., the "hostname" rule) and is perhaps better described as the "LDH rule", after the characters that it permits. The LDH rule, as updated, provides that the labels (words or strings separated by periods) that make up a domain name must consist of only the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen. No other symbols or punctuation characters are permitted, nor is blank space. If the hyphen is used, it is not permitted to appear at either the beginning or end of a label. There is an additional rule that essentially requires that top-level domain names not be all- numeric.

In fact as soon as IDNs are involved, and they are IDN TLDs (both ccTLDs and gTLDs now), the encoding chosen generates an ASCII string of the form xn--something where the something can have digits, including at the end, like shown in other answers.

However it is not really clear from where the "additional rule" in the last sentence comes from.

RFC 4697: Observed DNS Resolution Misbehavior (October 2006)

Not defining anything, but providing some interesting facts:

The root name servers receive a significant number of A record queries where the QNAME looks like an IPv4 address.

and

A possible solution is to delegate these numeric TLDs from the root zone to a separate set of servers to absorb the traffic.

Which clearly shows that indeed, in the wild, there are applications, maybe by mistake but it shows at least that it works technically, sending queries for names that are indeed formatted like IPv4 addresses, so with a fully numerical "TLD".

There was in fact an experience to launch a .42 registry, obviously completely outside of ICANN ecosystem. You can see a summary of it at http://www.dotsauce.com/experimental-numeric-tld-42-domain/ and an archive of their main explanations at https://web.archive.org/web/20101222151118/http://register.42registry.org:80/ (in French).

It did not went far, even if it technically works.

It showed for example that Microsoft based OS by default did not consider purely numeric TLDs at all, but they provided a patch for that: https://support.microsoft.com/en-us/help/947228/error-message-when-you-try-to-join-a-windows-vista-based-client-comput "When you try to join a Windows Vista-based client computer to a top level domain (TLD) that has a purely numeric suffix, the Windows Vista-based client computer cannot join the domain. [..] This behavior is by design."

Internet-Draft draft-liman-tld-names-06: Top Level Domain Name Specification (November 2011)

This finally gives some explanations on why purely numeric TLD or even TLD with one digit are sometimes considered invalid when it is not a clear consequence from above specifications:

(section 2.1 below refers to content in RFC 1123, quoted above)

In addition, the DISCUSSION section of Section 2.1 says:

 'However, a valid host name can never have the dotted-decimal form  #.#.#.#, since at least the highest-level component label will be  alphabetic.'  [Section 2.1] 

Some implementers may have understood the above phrase 'will be alphabetic' to be a protocol restriction.

But it basically just recommend to go with the flow and continue the same restrictions:

Neither [RFC0952] nor [RFC1123] explicitly states the reasons for these restrictions. It might be supposed that human factors were a consideration; [RFC1123] appears to suggest that one of the reasons was to prevent confusion between dotted-decimal IPv4 addresses and host domain names. In any case, it is reasonable to believe that the restrictions have been assumed in some deployed software, and that changes to the rules should be undertaken with caution.

Hence it offered this definition:

traditional-tld-label = 1*63(ALPHA)

This draft never converted to an RFC because not everyone agreed with it. You can find a thread with dissenting voices for it at https://www.ietf.org/mail-archive/web/dnsop/current/msg08866.html ; basically it was not clear if there was a restriction in the past that we are now trying to relax a little or if there never was a restriction to begin with and that people implemented systems wrongly.

For example you can see about this Chromium/Chrome bugreport: https://bugs.chromium.org/p/chromium/issues/detail?id=31405 Browsing failed if using a TLD starting with a digit or purely numeric (it worked if it ended with a digit with letters before). This was not considered as a bug, and is not fixed, because the browser ships with a list of TLDs so it can know which ones are valid which are not, besides testing their syntax.

ICANN Application Guidebook for new TLDs (June 2012)

Available at https://newgtlds.icann.org/en/applicants/agb/guidebook-full-04jun12-en.pdf it says the following starting at page 64:

The ASCII label (i.e., the label as transmitted on the wire) must be valid as specified in technical standards Domain Names: Implementation and Specification (RFC 1035), and Clarifications to the DNS Specification (RFC 2181) and any updates thereto.

The ASCII label must be a valid host name, as specified in the technical standards DOD Internet Host Table Specification (RFC 952), Requirements for Internet Hosts — Application and Support (RFC 1123), and Application Techniques for Checking and Transformation of Names (RFC 3696), Internationalized Domain Names in Applications (IDNA)(RFCs 5890-5894), and any updates thereto. This includes the following:

The ASCII label must consist entirely of letters (alphabetic characters a-z), or

The label must be a valid IDNA A-label (further restricted as described in Part II below).

Specially note the: The ASCII label must consist entirely of letters (alphabetic characters a-z)

This immediately forbids any full numerical, as well as in fact any digit, including at end, except for IDN TLDs, the one with the form xn--something.

Note that someone asked directly ICANN about this, and got the following reply, shown at https://domaingang.com/domain-news/icann-applicant-handbook-this-is-why-we-cannot-have-numeric-gtlds/ :

Please note Numeric TLD’s were prohibited in the first round of applications. The prohibition on numeric gTLDs in the applicant guidebook (http://newgtlds.icann.org/en/applicants/agb) derives from a number of technical concerns regarding the ability of such domains to operate properly. Domain names are often used in place where other kinds of identifiers may be used like IP addresses.

The fact that a TLD is all alphabetic is often a key determinant for software in identifying a domain name. If a TLD such as “.123” were allowed, you could have a domain name of “74.125.244.123” which would be difficult to discriminate from an IP address “74.125.244.123.”. There are also other considerations: some technical standards documentation states that TLDs will be alphabetical, which has been codified as an assumption in software also.

The limitation in the AGB to alphabetic characters was designed to limit these scenarios that means such TLDs are not likely to work well in software, as well as limit potential security issues that may result from the same issues.

like image 162
Patrick Mevzek Avatar answered Oct 03 '22 01:10

Patrick Mevzek


Actually there are quite a few TLDs currently in use that contain numbers:

XN--1QQW23A XN--3BST00M XN--3DS443G XN--3E0B707E XN--45BRJ9C XN--4GBRIM XN--55QW42G XN--55QX5D XN--6FRZ82G XN--6QQ986B3XL XN--80ADXHKS XN--80AO21A XN--80ASEHDB XN--80ASWG XN--90A3AC XN--C1AVG XN--CG4BKI XN--CLCHC0EA0B2G2A9GCD XN--CZR694B XN--CZRU2D XN--D1ACJ3B XN--FIQ228C5HS XN--FIQ64B XN--FIQS8S XN--FIQZ9S XN--FPCRJ9C3D XN--FZC2C9E2C XN--GECRJ9C XN--H2BRJ9C XN--I1B6B1A6A2E XN--IO0A7I XN--J1AMH XN--J6W193G XN--KPRW13D XN--KPRY57D XN--KPUT3I XN--L1ACC XN--LGBBAT1AD8J XN--MGB9AWBF XN--MGBA3A4F16A XN--MGBAAM7A8H XN--MGBAB2BD XN--MGBAYH7GPA XN--MGBBH1A71E XN--MGBC0A9AZCG XN--MGBERP4A5D4AR XN--MGBX4CD0AB XN--NGBC5AZD XN--NQV7F XN--NQV7FS00EMA XN--O3CW4H XN--OGBPF8FL XN--P1AI XN--PGBS0DH XN--Q9JYB4C XN--RHQV96G XN--S9BRJ9C XN--SES554G XN--UNUP4Y XN--VHQUV XN--WGBH1C XN--WGBL6A XN--XHQ521B XN--XKC2AL3HYE2A XN--XKC2DL3A5EE0H XN--YFRO4I67O XN--YGBI2AMMX XN--ZFR164B 

You can see an up to date list here data.iana.org/TLD/tlds-alpha-by-domain.txt or a list with descriptions here swcs.com.au/tld.htm

like image 37
insaner Avatar answered Oct 03 '22 02:10

insaner