Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve HTML page in proper encoding using Java?

How can I read HTTP stream with HTML page in page's encoding?

Here is a code fragment I use to get the HTTP stream. InputStreamReader has the encoding optional argument, but I have no ideas about the way to obtain it.

URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();
BufferedReader d = new BufferedReader(new InputStreamReader(is));

2 Answers

Retrieving a Webpage is a reasonably complicated process. That's why libraries such as HttpClient exist. My advice is that unless you have a really compelling reason otherwise, use HttpClient.

like image 183
cletus Avatar answered Apr 20 '26 15:04

cletus


When the connection is establised thru

URLConnection conn = url.openConnection();

you can get the encoding method name thru url.getContentEncoding() so pass this String to InputStreamReader() so the code looks like

BufferedReader d = new BufferedReader(new InputStreamReader(is,url.getContentEncoding()));

like image 28
Niger Avatar answered Apr 20 '26 13:04

Niger