Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if InputStream is Gzipped?

Is there any way to check if InputStream has been gzipped? Here's the code:

public static InputStream decompressStream(InputStream input) {     try {         GZIPInputStream gs = new GZIPInputStream(input);         return gs;     } catch (IOException e) {         logger.info("Input stream not in the GZIP format, using standard format");         return input;     } } 

I tried this way but it doesn't work as expected - values read from the stream are invalid. EDIT: Added the method I use to compress data:

public static byte[] compress(byte[] content) {     ByteArrayOutputStream baos = new ByteArrayOutputStream();     try {         GZIPOutputStream gs = new GZIPOutputStream(baos);         gs.write(content);         gs.close();     } catch (IOException e) {         logger.error("Fatal error occured while compressing data");         throw new RuntimeException(e);     }     double ratio = (1.0f * content.length / baos.size());     if (ratio > 1) {         logger.info("Compression ratio equals " + ratio);         return baos.toByteArray();     }     logger.info("Compression not needed");     return content;  } 
like image 301
voo Avatar asked Jan 27 '11 15:01

voo


People also ask

How do I know if Gzipped?

You can tell using Developer Tools (F12). Go to the Network tab, select the file you want to examine and then look at the Headers tab on the right. If you are gzipped, then you will see that in the Content-Encoding.

Is a file Gzipped?

A GZ file is an archive file compressed by the standard GNU zip (gzip) compression algorithm. It typically contains a single compressed file but may also store multiple compressed files. gzip is primarily used on Unix operating systems for file compression.

Is Java a gzip?

Welcome to Java GZIP example. GZIP is one of the favorite tool to compress file in Unix systems. We can compress a single file in GZIP format but we can't compress and archive a directory using GZIP like ZIP files.

What is gzip stream?

Compressed GZipStream objects written to a file with an extension of . gz can be decompressed using many common compression tools; however, this class does not inherently provide functionality for adding files to or extracting files from zip archives.


2 Answers

It's not foolproof but it's probably the easiest and doesn't rely on any external data. Like all decent formats, GZip too begins with a magic number which can be quickly checked without reading the entire stream.

public static InputStream decompressStream(InputStream input) {      PushbackInputStream pb = new PushbackInputStream( input, 2 ); //we need a pushbackstream to look ahead      byte [] signature = new byte[2];      int len = pb.read( signature ); //read the signature      pb.unread( signature, 0, len ); //push back the signature to the stream      if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip magic number        return new GZIPInputStream( pb );      else         return pb; } 

(Source for the magic number: GZip file format specification)

Update: I've just dicovered that there is also a constant called GZIP_MAGIC in GZipInputStream which contains this value, so if you really want to, you can use the lower two bytes of it.

like image 77
biziclop Avatar answered Oct 06 '22 01:10

biziclop


The InputStream comes from HttpURLConnection#getInputStream()

In that case you need to check if HTTP Content-Encoding response header equals to gzip.

URLConnection connection = url.openConnection(); InputStream input = connection.getInputStream();  if ("gzip".equals(connection.getContentEncoding())) {     input = new GZIPInputStream(input); }  // ... 

This all is clearly specified in HTTP spec.


Update: as per the way how you compressed the source of the stream: this ratio check is pretty... insane. Get rid of it. The same length does not necessarily mean that the bytes are the same. Let it always return the gzipped stream so that you can always expect a gzipped stream and just apply GZIPInputStream without nasty checks.

like image 23
BalusC Avatar answered Oct 06 '22 03:10

BalusC