Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How i can match root of domain name without www. using regex

I try to match root of domain name with regular expressions in JS. I have a problem when path hasn't www. in himself.

For example, i tried match from this string:

(http://web.archive.org/web/20080620033027/http://www.mrvc.indianrail.gov.in/overview.htm)

Thats regex what i try is presented below. I try him on regex101.com

/(?<=(\/\/(www\.)|\/\/)).+?(?=\/)/g

I expect the output array with names web.archive.org and mrvc.indianrail.gov.in but get web.archive.org and www.mrvc.indianrail.gov.in with www. in second case.

like image 765
Vsevolod Fedorov Avatar asked Apr 07 '19 23:04

Vsevolod Fedorov


People also ask

How do I find the regex for a domain name?

The valid domain name must satisfy the following conditions: The domain name should be a-z or A-Z or 0-9 and hyphen (-). The domain name should be between 1 and 63 characters long. The domain name should not start or end with a hyphen(-) (e.g. -geeksforgeeks.org or geeksforgeeks.org-).

How do I find the URL of a domain?

Search & Check a URL Here at checkdomain.com you can start a URL search. Just enter your desired URL in the search slot above and we will check if it is free. Our URL-Check checks the availability of up to 1000 domain extensions worldwide.

What does this regex do?

Short for regular expression, a regex is a string of text that lets you create patterns that help match, locate, and manage text. Perl is a great example of a programming language that utilizes regular expressions. However, its only one of the many places you can find regular expressions.


1 Answers

What about this regex:

(?<=https?:\/\/(?:www\.)?)(?!www\.).+?(?=\/)

it matches web.archive.org and mrvc.indianrail.gov.in without the www.

demo: https://regex101.com/r/5ZqK7n/3/

Differences with your initial regex:

  • In your positive lookbehind clause, I have s? to support https: URLs (remove it if not necessary)
  • (?:www\.)? can appear 0 to 1 time

  • After the lookbehind you add a negative lookahead (?!www\.) to not match, to avoid that your .+? matches the initial www.

like image 163
Allan Avatar answered Oct 20 '22 21:10

Allan