Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract and Parse HTML Table using Jsoup

How could I use Jsoup to extract specification data from this website separately for each row e.g. Network->Network Type, Battery etc.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class mobilereviews {
    public static void main(String[] args) throws Exception {
        Document doc = Jsoup.connect("http://mobilereviews.net/details-for-Motorola%20L7.htm").get();
        for (Element table : doc.select("table")) {
            for (Element row : table.select("tr")) {
                Elements tds = row.select("td");
                System.out.println(tds.get(0).text());   
            }
        }
    }
}
like image 265
KNU Avatar asked Feb 16 '26 02:02

KNU


1 Answers

Here is an attempt to find the solution to your problem

Document doc = Jsoup.connect("http://mobilereviews.net/details-for-Motorola%20L7.htm").get();

for (Element table : doc.select("table[id=phone_details]")) {
     for (Element row : table.select("tr:gt(2)")) {
        Elements tds = row.select("td:not([rowspan])");
        System.out.println(tds.get(0).text() + "->" + tds.get(1).text());
     }
}

Parsing the HTML is tricky and if the HTML changes your code needs to change as well.

You need to study the HTML markup to come up with your parsing rules first.

  • There are multiple tables in the HTML, so you first filter on the correct one table[id=phone_details]
  • The first 2 table rows contain only markup for formatting, so skip them tr:gt(2)
  • Every other row starts with the global description for the content type, filter it out td:not([rowspan])

For more complex options in the selector syntax, look here http://jsoup.org/cookbook/extracting-data/selector-syntax

like image 121
Joey Avatar answered Feb 18 '26 15:02

Joey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!