Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript/Regex for finding just the root domain name without sub domains

I had a search and found lot's of similar regex examples, but not quite what I need.

I want to be able to pass in the following urls and return the results:

  • www.google.com returns google.com

  • sub.domains.are.cool.google.com returns google.com

  • doesntmatterhowlongasubdomainis.idont.wantit.google.com returns google.com

  • sub.domain.google.com/no/thanks returns google.com

Hope that makes sense :) Thanks in advance!-James

like image 305
jamesmhaley Avatar asked Aug 09 '10 12:08

jamesmhaley


People also ask

How do I find the regex for a domain name?

The domain name should be a-z or A-Z or 0-9 and hyphen (-). The domain name should be between 1 and 63 characters long. The domain name should not start or end with a hyphen(-) (e.g. -geeksforgeeks.org or geeksforgeeks.org-). The last TLD (Top level domain) must be at least two characters and a maximum of 6 characters.

How do you find the root domain?

Domain names are formed by the rules and procedures of the Domain Name System (DNS). Any name registered in the DNS is a domain name. To find your root domain name, you can use the "dig" or "nslookup" command. For example, if your domain is "example.com", you would type "dig example.com" or "nslookup example.com".

What is the difference between subdomain and root domain?

The root domain is the overarching structure which contains the subdomains and every URL. If you want the data for an entire site, sticking with the root domain will likely be the easiest way to access this data. Root domains are sometimes subdivided into other smaller domains called subdomains.

How do you know if it is a domain or subdomain?

To recap, a subdomain is the portion of a URL that comes before the “main” domain name and the domain extension. For example, docs.themeisle.com . Subdomains can help you divide your website into logical parts or create separate sites, for example a separate blog for each sports team.


1 Answers

I've not done a lot of testing on this, but if I understand what you're asking for, this should be a decent starting point...

([A-Za-z0-9-]+\.([A-Za-z]{3,}|[A-Za-z]{2}\.[A-Za-z]{2}|[A-za-z]{2}))\b

EDIT:

To clarify, it's looking for:

one or more alpha-numeric characters or dashes, followed by a literal dot

and then one of three things...

  1. three or more alpha characters (i.e. com/net/mil/coop, etc.)
  2. two alpha characters, followed by a literal dot, followed by two more alphas (i.e. co.uk)
  3. two alpha characters (i.e. us/uk/to, etc)

and at the end of that, a word boundary (\b) meaning the end of the string, a space, or a non-word character (in regex word characters are typically alpha-numerics, and underscore).

As I say, I didn't do much testing, but it seemed a reasonable jumping off point. You'd likely need to try it and tune it some, and even then, it's unlikely that you'll get 100% for all test cases. There are considerations like Unicode domain names and all sorts of technically-valid-but-you'll-likely-not-encounter-in-the-wild things that'll trip up a simple regex like this, but this'll probably get you 90%+ of the way there.

like image 151
theraccoonbear Avatar answered Oct 05 '22 06:10

theraccoonbear