Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How t get specific value from html in java?

Tags:

java

html

extract

I am developing one Application which show Gold rate and create graph for this.
I find one website which provide me this gold rate regularly.My question is how to extract this specific value from html page.
Here is link which i need to extract = http://www.todaysgoldrate.co.in/todays-gold-rate-in-pune/ and this html page have following tag and content.

<p><em>10 gram gold Rate in pune = Rs.31150.00</em></p>     

Here is my code which i use for extracting but i didn't find way to extract specific content.

public class URLExtractor {

private static class HTMLPaserCallBack extends HTMLEditorKit.ParserCallback {

    private Set<String> urls;

    public HTMLPaserCallBack() {
        urls = new LinkedHashSet<String>();
    }

    public Set<String> getUrls() {
        return urls;
    }

    @Override
    public void handleSimpleTag(Tag t, MutableAttributeSet a, int pos) {
        handleTag(t, a, pos);
    }

    @Override
    public void handleStartTag(Tag t, MutableAttributeSet a, int pos) {
        handleTag(t, a, pos);
    }

    private void handleTag(Tag t, MutableAttributeSet a, int pos) {
        if (t == Tag.A) {
            Object href = a.getAttribute(HTML.Attribute.HREF);
            if (href != null) {
                String url = href.toString();
                if (!urls.contains(url)) {
                    urls.add(url);
                }
            }
        }
    }
}

public static void main(String[] args) throws IOException {
    InputStream is = null;
    try {
        String u = "http://www.todaysgoldrate.co.in/todays-gold-rate-in-pune/";   
        //Here i need to extract this content by tag wise or content wise....  

Thanks in Advance.......

like image 617
Sandip Armal Patil Avatar asked Oct 06 '22 03:10

Sandip Armal Patil


2 Answers

You can use library like Jsoup

You can get it from here --> Download Jsoup

Here is its API reference --> Jsoup API Reference

Its really very easy to parse HTML content using Jsoup.

Below is a sample code which might be helpful to you..

public class GetPTags {

           public static void main(String[] args){

             Document doc =  Jsoup.parse(readURL("http://www.todaysgoldrate.co.intodays-gold-rate-in-pune/"));
             Elements p_tags = doc.select("p");
             for(Element p : p_tags)
             {
                 System.out.println("P tag is "+p.text());
             }

            }

        public static String readURL(String url) {

        String fileContents = "";
        String currentLine = "";

        try {
            BufferedReader reader = new BufferedReader(new InputStreamReader(new URL(url).openStream()));
            fileContents = reader.readLine();
            while (currentLine != null) {
                currentLine = reader.readLine();
                fileContents += "\n" + currentLine;
            }
            reader.close();
            reader = null;
        } catch (Exception e) {
            JOptionPane.showMessageDialog(null, e.getMessage(), "Error Message", JOptionPane.OK_OPTION);
            e.printStackTrace();

        }

        return fileContents;
    }

}
like image 156
Pratik Avatar answered Oct 10 '22 03:10

Pratik


http://java-source.net/open-source/crawlers

You can use any of that's apis, but don't parse the HTML with the pure JDK, because it's too painfull.

like image 31
Enrique San Martín Avatar answered Oct 10 '22 03:10

Enrique San Martín