Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is the best way to extract the entire content from a BufferedReader object in Java?

i'm trying to get an entire WebPage through a URLConnection.

What's the most efficient way to do this?

I'm doing this already:

URL url = new URL("http://www.google.com/");
URLConnection connection;
connection = url.openConnection();
InputStream in = connection.getInputStream();        
BufferedReader bf = new BufferedReader(new InputStreamReader(in));
StringBuffer html = new StringBuffer();
String line = bf.readLine();
while(line!=null){
    html.append(line);
    line = bf.readLine();
}
bf.close();

html has the entire HTML page.

like image 705
santiagobasulto Avatar asked Oct 12 '10 20:10

santiagobasulto


1 Answers

I think this is the best way. The size of the page is fixed ("it is what it is"), so you can't improve on memory. Perhaps you can compress the contents once you have them, but they aren't very useful in that form. I would imagine that eventually you'll want to parse the HTML into a DOM tree.

Anything you do to parallelize the reading would overly complicate the solution.

I'd recommend using a StringBuilder with a default size of 2048 or 4096.

Why are you thinking that the code you posted isn't sufficient? You sound like you're guilty of premature optimization.

Run with what you have and sleep at night.

like image 81
duffymo Avatar answered Sep 28 '22 08:09

duffymo