Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use data retrieved from HTTPClient into JSoup

I am using HTTPClient to connect to a website.The following snippet of code is used for this purpose:

 byte[] responseBody = method.getResponseBody();
 System.out.println(new String(responseBody));

The above code displays the html code of website. Further I wanted to access only some data from the code which I was able to access using JSoup using following code snippet:

Document doc = Jsoup.connect(url).get();

In the above code I have directly specified url of website using "url". which means I do not require HTTPClient if I use JSoup. Is there a way I can use " responseBody" retrieved using HTTPClient to be integrated in JSoup code so that I do not have to use Document doc = Jsoup.connect(url).get();

Thanks

like image 624
user2822187 Avatar asked Feb 18 '14 09:02

user2822187


1 Answers

You can parse the HTML directly through Jsoup#parse:

Document doc =  Jsoup.parse(new String(responseBody));

Though I have my concerns of converting byte array to String directly, in your case however it should work fine.

The other way, I can use URLConnection and get a handle on the InputStream and parse it to a String with the provided charset encoding:

URLConnection connection = new URL("http://www.stackoverflow.com").openConnection();
        InputStream inStream = connection.getInputStream();
        String htmlText = org.apache.commons.io.IOUtils.toString(inStream, connection.getContentEncoding());

        Document document = Jsoup.parse(htmlText);
        Elements els = document.select("tbody > tr > td");

        for (Element el : els) {
            System.out.println(el.text());
        }

Would give:

Stack Overflow Server Fault Super User Web Applications Ask Ubuntu Webmasters Game Development TeX - LaTeX
Programmers Unix & Linux Ask Different (Apple) WordPress Answers Geographic Information Systems Electrical Engineering Android Enthusiasts Information Security
Database Administrators Drupal Answers SharePoint User Experience Mathematica more (14)
...
like image 119
StoopidDonut Avatar answered Oct 05 '22 15:10

StoopidDonut