I had a search and found lot's of similar regex examples, but not quite what I need.
I want to be able to pass in the following urls and return the results:
www.google.com returns google.com
sub.domains.are.cool.google.com returns google.com
doesntmatterhowlongasubdomainis.idont.wantit.google.com returns google.com
sub.domain.google.com/no/thanks returns google.com
Hope that makes sense :) Thanks in advance!-James
The domain name should be a-z or A-Z or 0-9 and hyphen (-). The domain name should be between 1 and 63 characters long. The domain name should not start or end with a hyphen(-) (e.g. -geeksforgeeks.org or geeksforgeeks.org-). The last TLD (Top level domain) must be at least two characters and a maximum of 6 characters.
Domain names are formed by the rules and procedures of the Domain Name System (DNS). Any name registered in the DNS is a domain name. To find your root domain name, you can use the "dig" or "nslookup" command. For example, if your domain is "example.com", you would type "dig example.com" or "nslookup example.com".
The root domain is the overarching structure which contains the subdomains and every URL. If you want the data for an entire site, sticking with the root domain will likely be the easiest way to access this data. Root domains are sometimes subdivided into other smaller domains called subdomains.
To recap, a subdomain is the portion of a URL that comes before the “main” domain name and the domain extension. For example, docs.themeisle.com . Subdomains can help you divide your website into logical parts or create separate sites, for example a separate blog for each sports team.
I've not done a lot of testing on this, but if I understand what you're asking for, this should be a decent starting point...
([A-Za-z0-9-]+\.([A-Za-z]{3,}|[A-Za-z]{2}\.[A-Za-z]{2}|[A-za-z]{2}))\b
EDIT:
To clarify, it's looking for:
one or more alpha-numeric characters or dashes, followed by a literal dot
and then one of three things...
and at the end of that, a word boundary (\b) meaning the end of the string, a space, or a non-word character (in regex word characters are typically alpha-numerics, and underscore).
As I say, I didn't do much testing, but it seemed a reasonable jumping off point. You'd likely need to try it and tune it some, and even then, it's unlikely that you'll get 100% for all test cases. There are considerations like Unicode domain names and all sorts of technically-valid-but-you'll-likely-not-encounter-in-the-wild things that'll trip up a simple regex like this, but this'll probably get you 90%+ of the way there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With