As title say ... I read content from htto response
InputStream is = response.getEntity().getContent();
String cw = IOUtils.toString(is);
byte[] b = cw.getBytes("Cp1250");
String x = StringUtils.newStringUtf8(b);
String content = new String(b, "UTF-8");
System.out.println(content);
I have tried plenty of variations. I am little confused about what are correct encoding constants used as strings. windows-1250 or Cp1250. UTF-8 or utf-8 or utf8?
Encoding have a canonical (unique) name and other varying names, and that case-insensitive. For instance "UTF-8" is the canonical name, but some java versions back it was "UTF8"; it got written more to the common usage. The same for "Windows-1250," which you might see also in HTML pages. "Cp1250" (Code-Page) is a java internal name.
In java byte[] is binary data, String (internally Unicode) is text. Conversion between both needs an encoding, often optional though, taking the operating system default.
byte, InputStream, OutputStream <-> String, char, Reader, Writer
String cw = IOUtils.toString(is, "UTF-8"); // InputStream is binary gives byte[], hence give encoding
byte[] b = cw.getBytes("Cp1250");
String x = new String(b, "Cp1250");
String content = s;
System.out.println(content);
To allow this universal (qua encoding) String, String internally uses char, UTF-16. String constants are stored in the .class file as UTF-8 (more compact).
You seem to think that a String
object has an encoding. That's not correct. An encoding is used as part of the translation from binary data (a byte[]
or InputStream
) to text data (a String
or char[]
etc).
It's not clear what IOUtils.toString
is doing, but it's almost certainly losing data or at least handling it inappropriately. If your data is originally in Windows-1250, then you should use an InputStreamReader
wrapping the InputStream
, specifying the charset in the InputStreamReader
constructor call.
It's not clear where UTF-8 comes in - you might want to write out the data in UTF-8 afterwards, but the result of that would be byte[]
, not a string.
You're converting backwards. You need to get the input data as a byte
array and then use String(byteArray, "Cp1250")
to create the String object. Then if you want UTF-8, use String.getBytes("UTF-8")
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With