Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java httpurlconnection cutting off html

Hey, I'm trying to get the html from a twitter profile page, but httpurlconnection is only returning a small snippet of the html. My code

for(int i = 0; i < urls.size(); i++)
{
URL url = new URL(urls.get(i));
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
System.out.println(connection.getResponseCode());
String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while((line = reader.readLine()) != null)
{
    builder.append(line);
}
String html = builder.toString();
}

I always get 200 as the response code for each call. However about 1/3 of the time the entire html document is returned, and the other half only the first few hundred lines. The amount returned when the html is cutoff is not always the same.

Any ideas? Thanks for any help!

Additional Info: After viewing the headers it seems I'm getting duplicate content-length headers. The first is the full length, the other is much shorter (and probably representative of the length I'm getting some of the time) How can I handle duplicate headers?

like image 247
JDetloff Avatar asked Jul 19 '10 21:07

JDetloff


1 Answers

This worked fine for me, I added a newline after builder.append(line); to make it more readable in the console, but other than that it returned all the HTML for this page:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

public class RetrieveHTML {

    public static void main(String[] args) throws IOException {
        List<String> urls = new ArrayList<String>();
        urls.add("http://stackoverflow.com/questions/3285077/java-httpurlconnection-cutting-off-html");

        for (int i = 0; i < urls.size(); i++) {
            URL url = new URL(urls.get(i));
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
            System.out.println(connection.getResponseCode());
            String line;
            StringBuilder builder = new StringBuilder();
            BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
            while ((line = reader.readLine()) != null) {
                builder.append(line);
                builder.append("\n"); 
            }
            String html = builder.toString();
            System.out.println("HTML " + html);
        }

    }
}
like image 176
Jon Avatar answered Sep 21 '22 22:09

Jon