Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JavaScript Regex URL extract domain only

Currently I can extract the 'domain' from any URL with the following regex:

/^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n\?\=]+)/im

However I'm also getting subdomain's too which I want to avoid. For example if I have sites:

  • www.google.com
  • yahoo.com/something
  • freds.meatmarket.co.uk?someparameter
  • josh.meatmarket.co.uk/asldf/asdf

I currently get:

  • google.com
  • yahoo.com
  • freds.meatmarket.co.uk
  • josh.meatmarket.co.uk

Those last two I would like to exclude the freds and josh subdomain portion and extract only the true domain which would just be meatmarket.co.uk.

I did find another SOF that tries to solve in PHP, unfortunately I don't know PHP. is this translatable to JS (I'm actually using Google Script FYI)?

  function topDomainFromURL($url) {
    $url_parts = parse_url($url);
    $domain_parts = explode('.', $url_parts['host']);
    if (strlen(end($domain_parts)) == 2 ) { 
      // ccTLD here, get last three parts
      $top_domain_parts = array_slice($domain_parts, -3);
    } else {
      $top_domain_parts = array_slice($domain_parts, -2);
    }
    $top_domain = implode('.', $top_domain_parts);
    return $top_domain;
  }
like image 371
MarkII Avatar asked Jan 15 '16 19:01

MarkII


1 Answers

Try this:

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.([a-z]{2,6}){1}
like image 50
osanger Avatar answered Sep 28 '22 03:09

osanger