Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does jsoup support xpath?

Tags:

xpath

jsoup

There's some work in progress related to adding xpath support to jsoup https://github.com/jhy/jsoup/pull/80.

  • Is it working?
  • How can I use it?
like image 579
gguardin Avatar asked Aug 16 '11 21:08

gguardin


People also ask

Can we use XPath in jsoup?

With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.

What is jsoup used for?

What It Is. jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.

Does jsoup support JavaScript?

Jsoup parses the source code as delivered from the server (or in this case loaded from file). It does not invoke client-side actions such as JavaScript or CSS DOM manipulation.

What is jsoup parse?

Description. The parse(String html) method parses the input HTML into a new Document. This document object can be used to traverse and get details of the html dom.


2 Answers

JSoup doesn't support XPath yet, but you may try XSoup - "Jsoup with XPath".

Here's an example quoted from the projects Github site (link):

@Test public void testSelect() {      String html = "<html><div><a href='https://github.com'>github.com</a></div>" +             "<table><tr><td>a</td><td>b</td></tr></table></html>";      Document document = Jsoup.parse(html);      String result = Xsoup.compile("//a/@href").evaluate(document).get();     Assert.assertEquals("https://github.com", result);      List<String> list = Xsoup.compile("//tr/td/text()").evaluate(document).list();     Assert.assertEquals("a", list.get(0));     Assert.assertEquals("b", list.get(1)); } 

There you'll also find a list of features and expressions of XPath that are supported by XSoup.

like image 64
ollo Avatar answered Oct 03 '22 00:10

ollo


Not yet,but the project JsoupXpath has make it.For example,

String html = "<html><body><script>console.log('aaaaa')</script><div class='test'>some body</div><div class='xiao'>Two</div></body></html>"; JXDocument underTest = JXDocument.create(html); String xpath = "//div[contains(@class,'xiao')]/text()"; JXNode node = underTest.selNOne(xpath); Assert.assertEquals("Two",node.asString()); 

By the way,it supports the complete W3C XPATH 1.0 standard syntax.Such as

//ul[@class='subject-list']/li[./div/div/span[@class='pl']/num()>(1000+90*(2*50))][last()][1]/div/h2/allText() //ul[@class='subject-list']/li[not(contains(self::li/div/div/span[@class='pl']//text(),'14582'))]/div/h2//text() 
like image 31
xiaohuo Avatar answered Oct 02 '22 23:10

xiaohuo