I'm new to java and having some problems.
The main idea is to connect to a website and collect information off it and store it in an array.
What I want the program to do is to search the website find a key word, and store what comes after the key word..
on the front page of daniweb along the bottom of the website there is a section called "Tag Cloud" which is filled with tags / short words
Tag Cloud: "i want to store what is written here"
My idea is to first read in the html of the website and then search that file for the key word followed by the text using Scanner and StringTokenizer then store as a array.
is there a better way / easier?
where do you suggest i look for some examples
here is what i have so far.
import java.net.*;
import java.io.*;
public class URLReader {
public static void main(String[] args) throws Exception {
URL dweb = new URL("http://www.daniweb.com/");
URLConnection dw = dweb.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(hc.getInputStream()));
System.out.println("connected to daniweb");
String inputLine;
PrintStream out = new PrintStream(new FileOutputStream("OutFile.txt"));
try {
while ((inputLine = in.readLine()) != null)
out.println(inputLine);
//System.out.println(inputLine);
//in.close();
out.close();
System.out.println("printed text to outfile");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
try {
Scanner scan = new Scanner(OutFile.txt);
String search = txtSearch.getText();
while (scan.hasNextLine()) {
line = scan.nextLine();
//still working
while (st.hasMoreTokens()) {
word = st.nextToken();
if (word == search) {
} else {
}
}
}
scan.close();
SearchWin.dispose();
} catch (IOException iox) {
}
}
any help at all would be very much appreciated!
Java applications are offered through web browsers as either a web start application (which do not interact with the browser once they are launched) or as a Java applet (which might interact with the browser). This change does not affect Web Start applications, it only impacts applets.
I recommend jsoup. It will retrieve and parse the page for you.
On daniweb, each tag cloud link has the CSS class tagcloudlink
. So you just need to tell jsoup to extract all text in tags that have the class tagcloudlink
.
This is off the top of my head plus some help from the jsoup site; I haven't tested it but it should get you started:
List<String> tags = new ArrayList<String>();
Document doc = Jsoup.connect("http://daniweb.com/").get();
Elements taglinks = doc.select("a.tagcloudlink");
for (Element link : taglinks) {
tags.add(link.text());
}
You could use HTML Parser for this. Here is a link to it: HTML Parser. Another one I've used a lot and like is Jericho HTML Parser. Here is a link: Jericho HTML Parser
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With