Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract absolute URL from relative HTML links using Jsoup?

Tags:

I am using Jsoup to extract URL of an webpage. The href attribute of those URL's are relative like:

<a href="/text">example</a>

Here is my attempt:

Document document = Jsoup.connect(url).get();
Elements results = document.select("div.results");
Elements dls = results.select("dl");
for (Element dl : dls) {
    String url = dl.select("a").attr("href");
}

This works fine, but if I use

String url = dl.select("a").attr("abs:href");

to get the absolute URL like http://example.com/text, it is not working. How can I get the absolute URL?

like image 396
sundhar Avatar asked Nov 10 '10 12:11

sundhar


2 Answers

You need Element#absUrl().

String url = dl.select("a").absUrl("href");

You can by the way shorten the select:

Document document = Jsoup.connect(url).get();
Elements links = document.select("div.results dl a");
for (Element link : links) {
    String url = link.absUrl("href");
}
like image 150
BalusC Avatar answered Oct 03 '22 23:10

BalusC


String url = dl.select("a").absUrl("href");

Is not correct because dl.select("a") will not return a single item but a collection. You need to get elements by index

eg :

Elements elems = dl.select("a");
Element a1 = elems.get(0); //0 is the index first element increasing to (elems.size()-1)
now you can do
a1.absUrl("href");

If you are sure only one item will result from the select above, or that the item you want will be the first, you can:

String url = dl.select("a").get(0).absUrl("href"); 

Which is also same as

String url = dl.select("a").first().absUrl("href");

It doesn't have to be the first element anyway, you can always replace the 0 in String url = dl.select("a").get(0).absUrl("href"); with the index of your element. Or use a select that is more specific that will only result in one element.

like image 39
tindase Avatar answered Oct 03 '22 23:10

tindase