I'm using URL.openConnection()
to download something from a server. The server says
Content-Type: text/plain; charset=utf-8
But connection.getContentEncoding()
returns null
. What up?
URLConnection is the base class. HttpURLConnection is a derived class which you can use when you need the extra API and you are dealing with HTTP or HTTPS only. HttpsURLConnection is a 'more derived' class which you can use when you need the 'more extra' API and you are dealing with HTTPS only.
The Java URLConnection class represents a communication link between the URL and the application. It can be used to read and write data to the specified resource referred by the URL.
URLConnection is an abstract class whose subclasses form the link between the user application and any resource on the web. We can use it to read/write from/to any resource referenced by a URL object. There are mainly two subclasses that extend the URLConnection class.
The value returned from URLConnection.getContentEncoding()
returns the value from header Content-Encoding
Code from URLConnection.getContentEncoding()
/**
* Returns the value of the <code>content-encoding</code> header field.
*
* @return the content encoding of the resource that the URL references,
* or <code>null</code> if not known.
* @see java.net.URLConnection#getHeaderField(java.lang.String)
*/
public String getContentEncoding() {
return getHeaderField("content-encoding");
}
Instead, rather do a connection.getContentType()
to retrieve the Content-Type and retrieve the charset from the Content-Type. I've included a sample code on how to do this....
String contentType = connection.getContentType();
String[] values = contentType.split(";"); // values.length should be 2
String charset = "";
for (String value : values) {
value = value.trim();
if (value.toLowerCase().startsWith("charset=")) {
charset = value.substring("charset=".length());
}
}
if ("".equals(charset)) {
charset = "UTF-8"; //Assumption
}
This is documented behaviour as the getContentEncoding()
method is specified to return the contents of the Content-Encoding
HTTP header, which is not set in your example. You could use the getContentType()
method and parse the resulting String on your own, or possibly go for a more advanced HTTP client library like the one from Apache.
Just as an addition to the answer from @Buhake Sindi. If you are using Guava, instead of the manual parsing you can do:
MediaType mediaType = MediaType.parse(httpConnection.getContentType());
Optional<Charset> typeCharset = mediaType.charset();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With