Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to parse a table from HTML using jsoup

Tags:

<td width="10"></td> <td width="65"><img src="/images/sparks/NIFTY.png" /></td>  <td width="65">5,390.85</td> <td width="65">5,428.15</td> <td width="65">5,376.15</td> <td width="65">5,413.85</td> 

This is the HTML source from which i have to extract the values 5390.85,5428.15 , 5376.15 , 5413.85. I wanted to do this using jsoup. But i am relatively new to jsoup( today i started using it). So how should i do this?

URL url = new URL("http://www.nseindia.com/content/equities/niftysparks.htm"); Document doc = Jsoup.parse(url,3*1000); String text = doc.body().text(); 

I have already extracted the content of the website using jsoup. but how to extract the values i require? Thanks in advance

like image 620
CyprUS Avatar asked Mar 22 '11 18:03

CyprUS


People also ask

What does jsoup parse do?

What It Is. jsoup can parse HTML files, input streams, URLs, or even strings. It eases data extraction from HTML by offering Document Object Model (DOM) traversal methods and CSS and jQuery-like selectors. jsoup can manipulate the content: the HTML element itself, its attributes, or its text.

How do you parse a table in Java?

setType(YourOrderBean. class); String[] columns = new String[] {"name", "orderNumber", "id"}; // the fields to bind do in your JavaBean strat. setColumnMapping(columns); CsvToBean csv = new CsvToBean(); List list = csv. parse(strat, yourReader);

What is jsoup document?

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.


1 Answers

Try something like this:-

URL url = new URL("http://www.nseindia.com/content/equities/niftysparks.htm"); Document doc = Jsoup.parse(url, 3000);  Element table = doc.select("table[class=niftyd]").first();  Iterator<Element> ite = table.select("td[width=65]").iterator();  ite.next(); // first one is image, skip it  System.out.println("Value 1: " + ite.next().text()); System.out.println("Value 2: " + ite.next().text()); System.out.println("Value 3: " + ite.next().text()); System.out.println("Value 4: " + ite.next().text()); 

Here's the printout:-

Value 1: 5,390.85 Value 2: 5,428.15 Value 3: 5,376.15 Value 4: 5,413.85 
like image 57
limc Avatar answered Sep 19 '22 13:09

limc