Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert xPath to JSoup query

Tags:

xpath

jsoup

Does anyone know of an xPath to JSoup convertor? I get the following xPath from Chrome:

 //*[@id="docs"]/div[1]/h4/a

and would like to change it into a Jsoup query. The path contains an href I'm trying to reference.

like image 249
Josh Avatar asked May 02 '13 10:05

Josh


People also ask

Can we use XPath in Jsoup?

With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.

What is element in jsoup?

A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements). From an Element, you can extract data, traverse the node graph, and manipulate the HTML.

What is Jsoup library?

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.


2 Answers

I am using Google Chrome Version 47.0.2526.73 m (64-bit) and I can now directly copy the Selector path which is compatible with JSoup

Chrome with Selector option



Copied Selector of the element in the screenshot span.com is
#question > table > tbody > tr:nth-child(1) > td.postcell > div > div.post-text > pre > code > span.com

like image 101
zackygaurav Avatar answered Sep 20 '22 19:09

zackygaurav


This is very easy to convert manually.

Something like this (not tested)

document.select("#docs > div:eq(1) > h4 > a").attr("href");

Documentation:

http://jsoup.org/cookbook/extracting-data/selector-syntax


Related question from comment

Trying to get the href for the first result here: cbssports.com/info/search#q=fantasy%20tom%20brady

Code

Elements select = Jsoup.connect("http://solr.cbssports.com/solr/select/?q=fantasy%20tom%20brady")
        .get()
        .select("response > result > doc > str[name=url]");

for (Element element : select) {
    System.out.println(element.html());
}

Result

http://fantasynews.cbssports.com/fantasyfootball/players/playerpage/187741/tom-brady
http://www.cbssports.com/nfl/players/playerpage/187741/tom-brady
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1825265/brady-lisoski
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1766777/blake-brady
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1851211/brady-foltz
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1860955/brady-earnhardt
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1673397/brady-amack

Screenshot from Developer Console - grabbing urls

enter image description here

like image 37
MariuszS Avatar answered Sep 20 '22 19:09

MariuszS