How can I extract the contents of the table located at: /id/2/year/2012/acc-conference">http://espn.go.com/mens-college-basketball/conferences/standings//id/2/year/2012/acc-conference
The few examples I've seen aren't too clear on how to get the contents of the table. Can anyone offer any help?
Jsoup is a Java html parser. It is a Java library that is used to parse html documents. Jsoup gives programming interface to concentrate and control information from URL or HTML documents. It utilizes DOM, CSS and Jquery-like systems for concentrating and controlling records.
With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.
By calling the jsoup methods from the JavaScript and Python code, you can parse the webpage or HTML string and transform it into the DOM model, then traverse the DOM and find the required elements.
You probably have this solved by now, but this will go over each table and print out the team name and the Win/Loss column. Adjust for the information you need. The second table is obviously formatted differently, so if you want different information from that table, you will have to adjust further. Let me know if you need any more help.
Document doc = Jsoup.connect("http://espn.go.com/mens-college-basketball/conferences/standings/_/id/2/year/2012/acc-conference").get();
for (Element table : doc.select("table.tablehead")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 6) {
System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With