Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using JSoup To Extract HTML Table Contents

Tags:

jsoup

How can I extract the contents of the table located at: /id/2/year/2012/acc-conference">http://espn.go.com/mens-college-basketball/conferences/standings//id/2/year/2012/acc-conference

The few examples I've seen aren't too clear on how to get the contents of the table. Can anyone offer any help?

like image 869
Johnny Rocket Avatar asked Nov 22 '11 04:11

Johnny Rocket


People also ask

What is jsoup used for?

Jsoup is a Java html parser. It is a Java library that is used to parse html documents. Jsoup gives programming interface to concentrate and control information from URL or HTML documents. It utilizes DOM, CSS and Jquery-like systems for concentrating and controlling records.

Can we use XPath in jsoup?

With XPath expressions it is able to select the elements within the HTML using Jsoup as HTML parser.

Can jsoup parse JavaScript?

By calling the jsoup methods from the JavaScript and Python code, you can parse the webpage or HTML string and transform it into the DOM model, then traverse the DOM and find the required elements.


1 Answers

You probably have this solved by now, but this will go over each table and print out the team name and the Win/Loss column. Adjust for the information you need. The second table is obviously formatted differently, so if you want different information from that table, you will have to adjust further. Let me know if you need any more help.

    Document doc = Jsoup.connect("http://espn.go.com/mens-college-basketball/conferences/standings/_/id/2/year/2012/acc-conference").get();

    for (Element table : doc.select("table.tablehead")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 6) {
                System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
            }
        }
    }
like image 79
B. Anderson Avatar answered Oct 22 '22 12:10

B. Anderson