I've this HTML code:
<td class="topic starter"><a href="http://www.test.com">Title</a></td>
I want to extract "Title" and the URL, so I did this:
Elements titleUrl = doc.getElementsByAttributeValue("class", "topic starter");
String title = titleUrl.text();
And this works for the title, but for the URL I tried the following:
String url = titleUrl.html();
String url = titleUrl.attr("a [href]");
String url = titleUrl.attr("a[href]");
String url = titleUrl.attr("href");
String url = titleUrl.attr("a");
But no one works and I'm not able to get the URL.
attr("abs:href") − provides the absolute url after resolving against the document's base URI. link. absUrl("href") − provides the absolute url after resolving against the document's base URI.
With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.
Try this:
Element link = doc.select("td.topic.starter > a").first();
String url = link.attr("href");
You first select the a
element and then extract its attribute href
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With