Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to fetch base url from the given url using java

I am trying to fetch base URL using java. I have used jtidy parser in my code to fetch the title. I am getting the title properly using jtidy, but I am not getting the base url from the given URL.

I have some URL as input:

String s1 = "http://staff.unak.is/andy/GameProgramming0910/new_page_2.htm";
String s2 = "http://www.complex.com/pop-culture/2011/04/10-hottest-women-in-fast-and-furious-movies";

From the first string, I want to fetch "http://staff.unak.is/andy/GameProgramming0910/" as a base URL and from the second string, I want "http://www.complex.com/" as a base URL.

I am using code:

URL url = new URL(s1);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
InputStream in = conn.getInputStream();
Document doc = new Tidy().parseDOM(in, null);
String titleText = doc.getElementsByTagName("title").item(0).getFirstChild()
.getNodeValue();

I am getting titletext, but please can let me know how to get base URL from above given URL?

like image 916
DJ31 Avatar asked May 16 '11 05:05

DJ31


People also ask

How do I find the URL of a base URL?

There's nothing in the Android URI class that gives you the base URL directly- As gnuf suggests, you'd have to construct it as the protocol + getHost(). The string parsing way might be easier and let you avoid stuffing everything in a try/catch block.

What is BaseURL in Java?

The BaseURL defines the basic capabilities of a portlet URL pointing back to the portlet. Since: 2.0. Method Summary. void. addProperty(java.lang.String key, java.lang.String value)

How do you create a parameter for a URL in Java?

In your Java program, you can use a String containing this text to create a URL object: URL myURL = new URL("http://example.com/"); The URL object created above represents an absolute URL. An absolute URL contains all of the information necessary to reach the resource in question.


2 Answers

Try to use the java.net.URL class, it will help you:

For the second case, that it is easier, you could use new URL(s2).getHost();

For the first case, you could get the host and also use getFile() method, and remove the string after the last slash ("/"). something like: (code not tested)

URL url = new URL(s1);
String path = url.getFile().substring(0, url.getFile().lastIndexOf('/'));
String base = url.getProtocol() + "://" + url.getHost() + path;
like image 77
Pih Avatar answered Sep 20 '22 12:09

Pih


You use the java.net.URL class to resolve relative URLs.

For the first case: removing the filename from the path:

new URL(new URL(s1), ".").toString()

For the second case: setting the root path:

new URL(new URL(s2), "/").toString()
like image 21
Ernesto Avatar answered Sep 18 '22 12:09

Ernesto