Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling gzipped content on Android

Tags:

java

android

gzip

I'm trying to parse a file from the web on Android using the DOM method.

The code in question is:

try {
    URL url = new URL("https://www.beatport.com/en-US/xml/content/home/detail/1/welcome_to_beatport");

    InputSource is = new InputSource(url.openStream());

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document document = db.parse(is);
    document.getDocumentElement().normalize();
} catch(Exception e) {
    Log.v(TAG, "Exception = " + e);
}

But I'm getting the following exception:

V/XMLParseTest1(  846):Exception = org.xml.sax.SAXParseException: name expected (position:START_TAG <null>@2:176 in java.io.InputStreamReader@43ea4538) 

The file is being handed to me gzipped. I've checked the is object in the debugger and its length is 6733 bytes (the same as the content length of the file in the response headers) however if I save the file to my harddrive from the browser it's size is 59114 bytes. Furthermore if I upload it to my own server which doesn't gzip XML-s when it serves them and set the URL the code runs just fine.

I'm guessing that what happens is that Android tries to parse the gzipped stream.

Is there a way to first unzip the stream? Any other ideas?

like image 636
janosrusiczki Avatar asked Oct 03 '10 00:10

janosrusiczki


2 Answers

You can wrap the result of url.openStream() in a GZIPInputStream. eg:

InputSource is = new InputSource(new GZIPInputStream(url.openStream()));

To auto-detect when to do this, use the Content-Encoding HTTP header. eg:

URLConnection connection = url.openConnection();
InputStream stream = connection.getInputStream();
if ("gzip".equals(connection.getContentEncoding())) {
  stream = new GZIPInputStream(stream));
}
InputSource is = new InputSource(stream);
like image 92
Laurence Gonsalves Avatar answered Nov 01 '22 04:11

Laurence Gonsalves


By default, this implementation of HttpURLConnection requests that servers use gzip compression. Since getContentLength() returns the number of bytes transmitted, you cannot use that method to predict how many bytes can be read from getInputStream(). Instead, read that stream until it is exhausted: when read() returns -1. Gzip compression can be disabled by setting the acceptable encodings in the request header:

urlConnection.setRequestProperty("Accept-Encoding", "identity");

so nothing need to do.

like image 29
itindex Avatar answered Nov 01 '22 04:11

itindex