I am attempting to convert the gzipped body of a HTTP response to plaintext. I've taken the byte array of this response and converted it to a ByteArrayInputStream. I've then converted this to a GZIPInputStream. I now want to read the GZIPInputStream and store the final decompressed HTTP response body as a plaintext String.
This code will store the final decompressed contents in an OutputStream, but I want to store the contents as a String:
public static int sChunk = 8192; ByteArrayInputStream bais = new ByteArrayInputStream(responseBytes); GZIPInputStream gzis = new GZIPInputStream(bais); byte[] buffer = new byte[sChunk]; int length; while ((length = gzis.read(buffer, 0, sChunk)) != -1) { out.write(buffer, 0, length); }
Decode bytes from an InputStream, you can use an InputStreamReader. A BufferedReader will allow you to read your stream line by line. Assuming the gzipped content is text, and not binary data. The content is text only.
byte[] compressed = compress(string); //In the main method public static byte[] compress(String str) throws Exception { ... ... return obj. toByteArray(); } public static String decompress(byte[] bytes) throws Exception { ... GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes)); ... }
To decode bytes from an InputStream, you can use an InputStreamReader. Then, a BufferedReader will allow you to read your stream line by line.
Your code will look like:
ByteArrayInputStream bais = new ByteArrayInputStream(responseBytes); GZIPInputStream gzis = new GZIPInputStream(bais); InputStreamReader reader = new InputStreamReader(gzis); BufferedReader in = new BufferedReader(reader); String readed; while ((readed = in.readLine()) != null) { System.out.println(readed); }
You should rather have obtained the response as an InputStream
instead of as byte[]
. Then you can ungzip it using GZIPInputStream
and read it as character data using InputStreamReader
and finally write it as character data into a String
using StringWriter
.
String body = null; String charset = "UTF-8"; // You should determine it based on response header. try ( InputStream gzippedResponse = response.getInputStream(); InputStream ungzippedResponse = new GZIPInputStream(gzippedResponse); Reader reader = new InputStreamReader(ungzippedResponse, charset); Writer writer = new StringWriter(); ) { char[] buffer = new char[10240]; for (int length = 0; (length = reader.read(buffer)) > 0;) { writer.write(buffer, 0, length); } body = writer.toString(); } // ...
If your final intent is to parse the response as HTML, then I strongly recommend to just use a HTML parser for this like Jsoup. It's then as easy as:
String html = Jsoup.connect("http://google.com").get().html();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With