Does anyone know of an xPath to JSoup convertor? I get the following xPath from Chrome:
//*[@id="docs"]/div[1]/h4/a
and would like to change it into a Jsoup query. The path contains an href I'm trying to reference.
With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.
A HTML element consists of a tag name, attributes, and child nodes (including text nodes and other elements). From an Element, you can extract data, traverse the node graph, and manipulate the HTML.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
I am using Google Chrome Version 47.0.2526.73 m (64-bit) and I can now directly copy the Selector path which is compatible with JSoup
Copied Selector of the element in the screenshot span.com
is #question > table > tbody > tr:nth-child(1) > td.postcell > div > div.post-text > pre > code > span.com
This is very easy to convert manually.
Something like this (not tested)
document.select("#docs > div:eq(1) > h4 > a").attr("href");
Documentation:
http://jsoup.org/cookbook/extracting-data/selector-syntax
Trying to get the href for the first result here: cbssports.com/info/search#q=fantasy%20tom%20brady
Code
Elements select = Jsoup.connect("http://solr.cbssports.com/solr/select/?q=fantasy%20tom%20brady")
.get()
.select("response > result > doc > str[name=url]");
for (Element element : select) {
System.out.println(element.html());
}
Result
http://fantasynews.cbssports.com/fantasyfootball/players/playerpage/187741/tom-brady
http://www.cbssports.com/nfl/players/playerpage/187741/tom-brady
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1825265/brady-lisoski
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1766777/blake-brady
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1851211/brady-foltz
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1860955/brady-earnhardt
http://fantasynews.cbssports.com/fantasycollegefootball/players/playerpage/1673397/brady-amack
Screenshot from Developer Console - grabbing urls
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With