Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract main domain name from a given url

I used the following to extract the domain from a url: (They are test cases)

String regex = "^(ww[a-zA-Z0-9-]{0,}\\.)";
ArrayList<String> cases = new ArrayList<String>();
cases.add("www.google.com");
cases.add("ww.socialrating.it");
cases.add("www-01.hopperspot.com");
cases.add("wwwsupernatural-brasil.blogspot.com");
cases.add("xtop10.net");
cases.add("zoyanailpolish.blogspot.com");

for (String t : cases) {  
    String res = t.replaceAll(regex, "");  
}

I can get the following results:

google.com
hopperspot.com
socialrating.it
blogspot.com
xtop10.net
zoyanailpolish.blogspot.com

The first four cases are good. The last one is not good. What I want is: blogspot.com for the last one, but it gives zoyanailpolish.blogspot.com. What am I doing wrong?

like image 799
chnet Avatar asked Aug 27 '11 20:08

chnet


People also ask

How do I extract a domain from a URL in Google Sheets?

The =REGEXREPLACE() function is built-in Google Sheets and it extracts domains from URLs. What's great about is it's only a simple line of code that you can paste into your cell. The function is not super technical and you can change it any way you see fit.

Which command will extract the domain suffix?

You can use the "whois" command to lookup the suffix for a given domain name. For example, if you enter "whois example.com", the output will return ".com".


1 Answers

Using Guava library, we can easily get domain name:

InternetDomainName.from(tld).topPrivateDomain()

Refer API link for more details

https://google.github.io/guava/releases/14.0/api/docs/

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/net/InternetDomainName.html

like image 175
Satya Avatar answered Oct 13 '22 13:10

Satya