Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

okhttp 3: how to decompress gzip/deflate response manually using Java/Android

I know that okhttp3 library by default it adds the header Accept-Encoding: gzip and decodes the response automatically for us.

The problem I'm dealing with a host that only accepts a header like: Accept-Encoding: gzip, deflate if I don't add the deflate part it fails. Now when I manually add that header to okhttp client, the library doesn't do the decompression anymore for me.

I've tried multiple solutions to take the response and try to manually decompress that but I've always ended up with an exception i.e. java.util.zip.ZipException: Not in GZIP format, here's what I've tried so far:

//decompresser
public static String decompressGZIP(InputStream inputStream) throws IOException
{
    InputStream bodyStream = new GZIPInputStream(inputStream);
    ByteArrayOutputStream outStream = new ByteArrayOutputStream();
    byte[] buffer = new byte[4096];
    int length;
    while ((length = bodyStream.read(buffer)) > 0) 
    {
        outStream.write(buffer, 0, length);
    }

    return new String(outStream.toByteArray());
}


//run scraper
scrape(api, new Callback()
{
    // Something went wrong
    @Override
    public void onFailure(@NonNull Call call, @NonNull IOException e)
    {
    }

    @Override
    public void onResponse(@NonNull Call call, @NonNull Response response) throws IOException
    {
        if (response.isSuccessful())
        {
            try
            {
                InputStream responseBodyBytes = responseBody.byteStream();
                returnedObject = GZIPCompression.decompress(responseBodyBytes);

                if (returnedObject != null)
                {
                    String htmlResponse = returnedObject.toString();
                }
            }
            catch (ProtocolException e){}

            if(response != null) response.close();
        }
    }
});



private Call scrape(Map<?, ?> api, Callback callback)
{
    MediaType JSON = MediaType.parse("application/json; charset=utf-8");
    String method = (String) api.get("method");
    String url = (String) api.get("url");
    Request.Builder requestBuilder = new Request.Builder().url(url);
    RequestBody requestBody;

    requestBuilder.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0");
    requestBuilder.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
    requestBuilder.header("Accept-Language", "en-US,en;q=0.5");
    requestBuilder.header("Accept-Encoding", "gzip, deflate");
    requestBuilder.header("Connection", "keep-alive");
    requestBuilder.header("Upgrade-Insecure-Requests", "1");
    requestBuilder.header("Cache-Control", "max-age=0");

    Request request = requestBuilder.build();

    Call call = client.newCall(request);
    call.enqueue(callback);

    return call;
}

Just a note, the response headers will always return Content-Encoding: gzip and Transfer-Encoding: chunked

One more thing, I've also tried the solution in this topic and it still fails with D/OkHttp: java.io.IOException: ID1ID2: actual 0x00003c68 != expected 0x00001f8b.

Any help would be appreciated..

like image 747
Desolator Avatar asked Aug 17 '18 18:08

Desolator


1 Answers

After 6 hours of digging I found the correct solution and as usual it was easier than I thought, so I was basically trying to decompress a page that's not gzipped for that reason it was failing. Now once I hit the second page (which is compressed) I get a gzipped response where the code above should handle it. Also if anyone wants the solution I used a modified interceptor just like the one in this answer so you don't need to use a custom function to handle the decompression.

I modified the unzip method to make the okhttp interceptor work with compressed and uncompressed responses:

OkHttpClient.Builder clientBuilder = new OkHttpClient.Builder().addInterceptor(new UnzippingInterceptor());
OkHttpClient client = clientBuilder.build();

And the Interceptor is like dis:

private class UnzippingInterceptor implements Interceptor {
    @Override
    public Response intercept(Chain chain) throws IOException {
        Response response = chain.proceed(chain.request());
        return unzip(response);
    }
  

// copied from okhttp3.internal.http.HttpEngine (because is private)
private Response unzip(final Response response) throws IOException {
    if (response.body() == null)
    {
        return response;
    }
    
    //check if we have gzip response
    String contentEncoding = response.headers().get("Content-Encoding");
    
    //this is used to decompress gzipped responses
    if (contentEncoding != null && contentEncoding.equals("gzip"))
    {
        Long contentLength = response.body().contentLength();
        GzipSource responseBody = new GzipSource(response.body().source());
        Headers strippedHeaders = response.headers().newBuilder().build();
        return response.newBuilder().headers(strippedHeaders)
                .body(new RealResponseBody(response.body().contentType().toString(), contentLength, Okio.buffer(responseBody)))
                .build();
    }
    else
    {
        return response;
    }
}
}
like image 166
Desolator Avatar answered Nov 13 '22 19:11

Desolator