Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HttpClient ignoring encoding, on a single computer

I am using HttpClient (version 3.1) on several different (but apparently identical) computers to read a UTF-8 encoded JSON data from a URL.

On all the machines, save one, it works fine. I have some Spanish language words and they come through with accents and tildes intact.

One computer stubbornly refuses to cooperate. It is apparently treating the data as ISO-8859-1, despite a Content-Type: application/json;charset=utf-8 header.

If I use curl to access that URL from that computer, it works correctly. On every other computer, both curl and my HttpClient-based program work correctly.

I did an md5sum on the common-httpclient.jar file on each machine: the same.

Is there some setting, deep in Linux, that might be different and be messing with me? Any other theories, or even places to look?

EDIT: some people asked for more details.

Originally I had the problem deep in the bowels of a complex Tomcat app, but I lightly adapted the sample to just retrieve the URL in question, and (fortunately) had the same problem.

These are Linux 2.6 machines running jdk1.7.0_45.

An env command yields a bunch of variables. The only one that looks remotely on point is LANG=en_US.UTF-8.

like image 833
Michael Lorton Avatar asked May 15 '14 06:05

Michael Lorton


1 Answers

How do you get the json response data from HttpClient?

If you get it back in binary form (through getResponseBodyAsStream() for example), and then convert it to a String without specifying charset, then the result depends on your JVM's default charset.

You can check the value of JVM default charset by:

Charset.defaultCharset().name()

This might give "UTF-8" on all machines except the one failing.

like image 57
Fradenger Avatar answered Oct 05 '22 22:10

Fradenger