Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify the top level domain of a URL object using java?

Given this :

URL u=new URL("someURL");

How do i identify the top level domain of the URL..

like image 761
trinity Avatar asked Jan 26 '10 17:01

trinity


3 Answers

Guava provides a nice utility for this. It works as follow:

InternetDomainName.from("someurl.co.uk").publicSuffix() will get you co.uk InternetDomainName.from("someurl.de").publicSuffix() will get you de

like image 130
jeremie Avatar answered Oct 19 '22 10:10

jeremie


So you want to have the top-level domain part only?

//parameter urlString: a String
//returns: a String representing the TLD of urlString, or null iff urlString is malformed
private String getTldString(String urlString) {
    URL url = null;
    String tldString = null;
    try {
        url = new URL(urlString);
        String[] domainNameParts = url.getHost().split("\\.");
        tldString = domainNameParts[domainNameParts.length-1];
    }
    catch (MalformedURLException e) {   
    }

    return tldString;
}

Let's test it!

@Test 
public void identifyLocale() {
    String ukString = "http://www.amazon.co.uk/Harry-Potter-Sheet-Complete-Series/dp/0739086731";
    logger.debug("ukString TLD: {}", getTldString(ukString));

    String deString = "http://www.amazon.de/The-Essential-George-Gershwin/dp/B00008GEOT";
    logger.debug("deString TLD: {}", getTldString(deString));

    String ceShiString = "http://例子.测试";
    logger.debug("ceShiString TLD: {}", getTldString(ceShiString));

    String dokimeString = "http://παράδειγμα.δοκιμή";
    logger.debug("dokimeString TLD: {}", getTldString(dokimeString));

    String nullString = null;
    logger.debug("nullString TLD: {}", getTldString(nullString));

    String lolString = "lol, this is a malformed URL, amirite?!";
    logger.debug("lolString TLD: {}", getTldString(lolString));

}

Output:

ukString TLD: uk
deString TLD: de
ceShiString TLD: 测试
dokimeString TLD: δοκιμή
nullString TLD: null
lolString TLD: null
like image 5
Abdull Avatar answered Oct 19 '22 09:10

Abdull


The host part of the url conforms to RFC 2732 according to the docs. It would imply that simply splitting the string you get from

  String host = u.getHost();

would not be enough. You will need to ensure that you conform to the RFC 2732 when searching the host OR if you can guarantee that all addresses are of the form server.com then you can search for the last . in the string and grab the tld.

like image 3
Vincent Ramdhanie Avatar answered Oct 19 '22 08:10

Vincent Ramdhanie