Get domain name from given url

People also ask

What is domain from URL?

Simply put, a domain name (or just 'domain') is the name of a website. It's what comes after “@” in an email address, or after “www.” in a web address. If someone asks how to find you online, what you tell them is usually your domain name.

How do I extract a domain from a URL in Google Sheets?

The =REGEXREPLACE() function is built-in Google Sheets and it extracts domains from URLs. What's great about is it's only a simple line of code that you can paste into your cell. The function is not super technical and you can change it any way you see fit.

If you want to parse a URL, use java.net.URI. java.net.URL has a bunch of problems -- its equals method does a DNS lookup which means code using it can be vulnerable to denial of service attacks when used with untrusted inputs.

"Mr. Gosling -- why did you make url equals suck?" explains one such problem. Just get in the habit of using java.net.URI instead.

public static String getDomainName(String url) throws URISyntaxException {
    URI uri = new URI(url);
    String domain = uri.getHost();
    return domain.startsWith("www.") ? domain.substring(4) : domain;
}

should do what you want.

Though It seems to work fine, is there any better approach or are there some edge cases, that could fail.

Your code as written fails for the valid URLs:

httpfoo/bar -- relative URL with a path component that starts with http.
HTTP://example.com/ -- protocol is case-insensitive.
//example.com/ -- protocol relative URL with a host
www/foo -- a relative URL with a path component that starts with www
wwwexample.com -- domain name that does not starts with www. but starts with www.

Hierarchical URLs have a complex grammar. If you try to roll your own parser without carefully reading RFC 3986, you will probably get it wrong. Just use the one that's built into the core libraries.

If you really need to deal with messy inputs that java.net.URI rejects, see RFC 3986 Appendix B:

Appendix B. Parsing a URI Reference with a Regular Expression

As the "first-match-wins" algorithm is identical to the "greedy" disambiguation method used by POSIX regular expressions, it is natural and commonplace to use a regular expression for parsing the potential five components of a URI reference.

The following line is the regular expression for breaking-down a well-formed URI reference into its components.
  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
   12            3  4          5       6  7        8 9
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis).

import java.net.*;
import java.io.*;

public class ParseURL {
  public static void main(String[] args) throws Exception {

    URL aURL = new URL("http://example.com:80/docs/books/tutorial"
                       + "/index.html?name=networking#DOWNLOADING");

    System.out.println("protocol = " + aURL.getProtocol()); //http
    System.out.println("authority = " + aURL.getAuthority()); //example.com:80
    System.out.println("host = " + aURL.getHost()); //example.com
    System.out.println("port = " + aURL.getPort()); //80
    System.out.println("path = " + aURL.getPath()); //  /docs/books/tutorial/index.html
    System.out.println("query = " + aURL.getQuery()); //name=networking
    System.out.println("filename = " + aURL.getFile()); ///docs/books/tutorial/index.html?name=networking
    System.out.println("ref = " + aURL.getRef()); //DOWNLOADING
  }
}

Here is a short and simple line using InternetDomainName.topPrivateDomain() in Guava: InternetDomainName.from(new URL(url).getHost()).topPrivateDomain().toString()

Given http://www.google.com/blah, that will give you google.com. Or, given http://www.google.co.mx, it will give you google.co.mx.

As Sa Qada commented in another answer on this post, this question has been asked earlier: Extract main domain name from a given url. The best answer to that question is from Satya, who suggests Guava's InternetDomainName.topPrivateDomain()

public boolean isTopPrivateDomain()

Indicates whether this domain name is composed of exactly one subdomain component followed by a public suffix. For example, returns true for google.com and foo.co.uk, but not for www.google.com or co.uk.

Warning: A true result from this method does not imply that the domain is at the highest level which is addressable as a host, as many public suffixes are also addressable hosts. For example, the domain bar.uk.com has a public suffix of uk.com, so it would return true from this method. But uk.com is itself an addressable host.

This method can be used to determine whether a domain is probably the highest level for which cookies may be set, though even that depends on individual browsers' implementations of cookie controls. See RFC 2109 for details.

Putting that together with URL.getHost(), which the original post already contains, gives you:

import com.google.common.net.InternetDomainName;

import java.net.URL;

public class DomainNameMain {

  public static void main(final String... args) throws Exception {
    final String urlString = "http://www.google.com/blah";
    final URL url = new URL(urlString);
    final String host = url.getHost();
    final InternetDomainName name = InternetDomainName.from(host).topPrivateDomain();
    System.out.println(urlString);
    System.out.println(host);
    System.out.println(name);
  }
}

I wrote a method (see below) which extracts a url's domain name and which uses simple String matching. What it actually does is extract the bit between the first "://" (or index 0 if there's no "://" contained) and the first subsequent "/" (or index String.length() if there's no subsequent "/"). The remaining, preceding "www(_)*." bit is chopped off. I'm sure there'll be cases where this won't be good enough but it should be good enough in most cases!

Mike Samuel's post above says that the java.net.URI class could do this (and was preferred to the java.net.URL class) but I encountered problems with the URI class. Notably, URI.getHost() gives a null value if the url does not include the scheme, i.e. the "http(s)" bit.

/**
 * Extracts the domain name from {@code url}
 * by means of String manipulation
 * rather than using the {@link URI} or {@link URL} class.
 *
 * @param url is non-null.
 * @return the domain name within {@code url}.
 */
public String getUrlDomainName(String url) {
  String domainName = new String(url);

  int index = domainName.indexOf("://");

  if (index != -1) {
    // keep everything after the "://"
    domainName = domainName.substring(index + 3);
  }

  index = domainName.indexOf('/');

  if (index != -1) {
    // keep everything before the '/'
    domainName = domainName.substring(0, index);
  }

  // check for and remove a preceding 'www'
  // followed by any sequence of characters (non-greedy)
  // followed by a '.'
  // from the beginning of the string
  domainName = domainName.replaceFirst("^www.*?\\.", "");

  return domainName;
}

I made a small treatment after the URI object creation

 if (url.startsWith("http:/")) {
        if (!url.contains("http://")) {
            url = url.replaceAll("http:/", "http://");
        }
    } else {
        url = "http://" + url;
    }
    URI uri = new URI(url);
    String domain = uri.getHost();
    return domain.startsWith("www.") ? domain.substring(4) : domain;

val host = url.split("/")[2]

Related questions
                            
                                Optional orElse Optional in Java
                            
                                Understanding Spliterator, Collector and Stream in Java 8
                            
                                Is there a fixed sized queue which removes excessive elements?
                            
                                How to check type of variable in Java?
                            
                                Spring @Transaction method call by the method within the same class, does not work?
                            
                                What does Serializable mean?
                            
                                Method Overloading for null argument
                            
                                Why does changing the returned variable in a finally block not change the return value?
                            
                                How to convert a LocalDate to an Instant?
                            
                                How to find the length of an array list? [duplicate]
                            
                                Counting Line Numbers in Eclipse [closed]
                            
                                Does adding a duplicate value to a HashSet/HashMap replace the previous value
                            
                                Bold words in a string of strings.xml in Android
                            
                                How to set a JVM TimeZone Properly
                            
                                && (AND) and || (OR) in IF statements
                            
                                Retrieving the inherited attribute names/values using Java Reflection
                            
                                javac option to compile all java files under a given directory recursively
                            
                                Netty vs Apache MINA
                            
                                How to get maximum value from the Collection (for example ArrayList)?
                            
                                How to remove duplicate white spaces in string using Java?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Get domain name from given url

Tags:

java

url

People also ask

Appendix B. Parsing a URI Reference with a Regular Expression

Recent Activity

Donate For Us