I am using HttpClient (version 3.1) on several different (but apparently identical) computers to read a UTF-8 encoded JSON data from a URL.
On all the machines, save one, it works fine. I have some Spanish language words and they come through with accents and tildes intact.
One computer stubbornly refuses to cooperate. It is apparently treating the data as ISO-8859-1, despite a Content-Type: application/json;charset=utf-8
header.
If I use curl
to access that URL from that computer, it works correctly. On every other computer, both curl and my HttpClient-based program work correctly.
I did an md5sum on the common-httpclient.jar file on each machine: the same.
Is there some setting, deep in Linux, that might be different and be messing with me? Any other theories, or even places to look?
EDIT: some people asked for more details.
Originally I had the problem deep in the bowels of a complex Tomcat app, but I lightly adapted the sample to just retrieve the URL in question, and (fortunately) had the same problem.
These are Linux 2.6 machines running jdk1.7.0_45.
An env
command yields a bunch of variables. The only one that looks remotely on point is LANG=en_US.UTF-8
.
How do you get the json response data from HttpClient?
If you get it back in binary form (through getResponseBodyAsStream()
for example), and then convert it to a String without specifying charset, then the result depends on your JVM's default charset.
You can check the value of JVM default charset by:
Charset.defaultCharset().name()
This might give "UTF-8" on all machines except the one failing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With