Currently I can extract the 'domain' from any URL with the following regex:
/^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n\?\=]+)/im
However I'm also getting subdomain's too which I want to avoid. For example if I have sites:
I currently get:
Those last two I would like to exclude the freds and josh subdomain portion and extract only the true domain which would just be meatmarket.co.uk.
I did find another SOF that tries to solve in PHP, unfortunately I don't know PHP. is this translatable to JS (I'm actually using Google Script FYI)?
  function topDomainFromURL($url) {
    $url_parts = parse_url($url);
    $domain_parts = explode('.', $url_parts['host']);
    if (strlen(end($domain_parts)) == 2 ) { 
      // ccTLD here, get last three parts
      $top_domain_parts = array_slice($domain_parts, -3);
    } else {
      $top_domain_parts = array_slice($domain_parts, -2);
    }
    $top_domain = implode('.', $top_domain_parts);
    return $top_domain;
  }
                Try this:
https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.([a-z]{2,6}){1}
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With