Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a table from an html page using JAVA

I am working on a project where I am trying to fetch financial statements from the internet and use them in a JAVA application to automatically create ratios, and charts.

The site I am using uses a login and password to get to the tables.
The Tag is TBODY, but there are 2 other TBODY's in the html.

How can I use java to print my table to a txt file where I can then use in my application? What would the best way to go about this, and what should I read up on?

like image 561
user1093111 Avatar asked May 27 '12 02:05

user1093111


People also ask

Can Java read HTML file?

In java, we can extract the HTML content and can parse the HTML Document.

How do you create a table in Java?

JTable(): A table is created with empty cells. JTable(int rows, int cols): Creates a table of size rows * cols. JTable(Object[][] data, Object []Column): A table is created with the specified name where []Column defines the column names.

Can we include Java in HTML?

An applet is a Java program that can be included a web page by using HTML tags. The applet tag is the simpler but older method, and has been superseded by the object tag. Add a Java applet by specifying the attributes of the applet tag.


1 Answers

If this were my project, I'd look into using an HTML parser, something like jsoup (although others are available). The jsoup site has a tutorial, and after playing with it a while, you'll likely find it pretty easy to use.

For example, for an HTML table like so:

enter image description here

jsoup could parse it like so:

import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class TableEg {
   public static void main(String[] args) {
      String html = "http://publib.boulder.ibm.com/infocenter/iadthelp/v7r1/topic/" +
            "com.ibm.etools.iseries.toolbox.doc/htmtblex.htm";
      try {
         Document doc = Jsoup.connect(html).get();
         Elements tableElements = doc.select("table");

         Elements tableHeaderEles = tableElements.select("thead tr th");
         System.out.println("headers");
         for (int i = 0; i < tableHeaderEles.size(); i++) {
            System.out.println(tableHeaderEles.get(i).text());
         }
         System.out.println();

         Elements tableRowElements = tableElements.select(":not(thead) tr");

         for (int i = 0; i < tableRowElements.size(); i++) {
            Element row = tableRowElements.get(i);
            System.out.println("row");
            Elements rowItems = row.select("td");
            for (int j = 0; j < rowItems.size(); j++) {
               System.out.println(rowItems.get(j).text());
            }
            System.out.println();
         }

      } catch (IOException e) {
         e.printStackTrace();
      }
   }
}

Resulting in the following output:

headers
ACCOUNT
NAME
BALANCE

row
0000001
Customer1
100.00

row
0000002
Customer2
200.00

row
0000003
Customer3
550.00
like image 176
Hovercraft Full Of Eels Avatar answered Nov 07 '22 10:11

Hovercraft Full Of Eels