Java - Searching For Data within a Website

Tags:

java

I'm new to java and having some problems.

The main idea is to connect to a website and collect information off it and store it in an array.

What I want the program to do is to search the website find a key word, and store what comes after the key word..

on the front page of daniweb along the bottom of the website there is a section called "Tag Cloud" which is filled with tags / short words

Tag Cloud: "i want to store what is written here"

My idea is to first read in the html of the website and then search that file for the key word followed by the text using Scanner and StringTokenizer then store as a array.

is there a better way / easier?

where do you suggest i look for some examples

here is what i have so far.

import java.net.*;
import java.io.*;

public class URLReader {

    public static void main(String[] args) throws Exception {

        URL dweb = new URL("http://www.daniweb.com/");
        URLConnection dw = dweb.openConnection();
        BufferedReader in = new BufferedReader(new InputStreamReader(hc.getInputStream()));
        System.out.println("connected to daniweb");
        String inputLine;

        PrintStream out = new PrintStream(new FileOutputStream("OutFile.txt"));

        try {
        while ((inputLine = in.readLine()) != null)
            out.println(inputLine);

            //System.out.println(inputLine);
            //in.close();
        out.close();
        System.out.println("printed text to outfile");
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        try {
            Scanner scan = new Scanner(OutFile.txt);
            String search = txtSearch.getText();
            while (scan.hasNextLine()) {
                line = scan.nextLine();
            //still working
                while (st.hasMoreTokens()) {
                    word = st.nextToken();
                    if (word == search) {

                    } else {

                    }
                }
            }
            scan.close();
            SearchWin.dispose();
        } catch (IOException iox) {
        }
    }

any help at all would be very much appreciated!

859

asked Aug 25 '10 12:08

AdianDes

2 Answers

I recommend jsoup. It will retrieve and parse the page for you.

On daniweb, each tag cloud link has the CSS class tagcloudlink. So you just need to tell jsoup to extract all text in tags that have the class tagcloudlink.

This is off the top of my head plus some help from the jsoup site; I haven't tested it but it should get you started:

List<String> tags = new ArrayList<String>();
Document doc = Jsoup.connect("http://daniweb.com/").get();
Elements taglinks = doc.select("a.tagcloudlink");
for (Element link : taglinks) {
    tags.add(link.text());
}

answered Oct 12 '22 23:10

Jeff

You could use HTML Parser for this. Here is a link to it: HTML Parser. Another one I've used a lot and like is Jericho HTML Parser. Here is a link: Jericho HTML Parser

answered Oct 13 '22 00:10

Corv1nus

Related questions
                            
                                Any way to have Java Web Start automatically install shortcuts?
                            
                                How to decrypt AES/CBC with known IV
                            
                                Capturing Log4j output when executing TestNG tests
                            
                                Impossible site for HtmlUnit?
                            
                                In the database, why can't we just use "Long" integers for dates (millis since epoch)
                            
                                Using SAX (Java) to parse multiple XML messages from a single TCP-stream
                            
                                Scale() of Divide method in BigDecimal
                            
                                Spring 3 Annotated Configuration Picks up @Configuration and @Component but not @Controller
                            
                                Java - Filling a Custom Shape
                            
                                Dynamic ui:include inside ui:repeat. Is there a simple solution?
                            
                                What is a good, simple scripting language to embed into a Java game engine?
                            
                                Setting wallpaper in Android
                            
                                How do I make javadoc inheritance work for external APIs? (with Maven2)
                            
                                Running a standalone Hadoop application on multiple CPU cores
                            
                                How to use Maven Checkstyle plugin in multi-module project?
                            
                                java.util.concurrent.LinkedBlockingQueue put method requires Nothing as argument in Scala
                            
                                Java hangs even though script's execution is completed
                            
                                Java/Wicket - How to stop browsers from caching pages?
                            
                                Does anyone know of any sprite collections?
                            
                                How to setup a main menu layout in Android?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With