Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding ZipSecureFile.setMinInflateRatio(double ratio)

I am using this function call, because when I read a trusted file, It results in zipbomb error.

ZipSecureFile.setMinInflateRatio(double ratio)

   FileInputStream file = new FileInputStream("/file/path/report.xlsx"); 
   ZipSecureFile.setMinInflateRatio(-1.0d);
   XSSFWorkbook wb = new XSSFWorkbook(file);

I am trying to understand how it works?

The only source I could find is https://poi.apache.org/apidocs/org/apache/poi/openxml4j/util/ZipSecureFile.html

But, couldn't get a clear picture as I am new to this concept.

What are the differences between

ZipSecureFile.setMinInflateRatio(-1.0d);

vs

ZipSecureFile.setMinInflateRatio(0.009);

vs

ZipSecureFile.setMinInflateRatio(0);
like image 838
Chid Avatar asked Dec 18 '22 03:12

Chid


1 Answers

A zip bomb detection works the following way:

While uncompressing it checks the ratio compressedBytes/uncompressedBytes and if this falls below a special amount (MinInflateRatio), then a bomb was detected.

So if the ratio compressedBytes/uncompressedBytes is 0.01d for example, then that means that the compressed file is 100 times smaller than the uncompressed one whithout information lost. In other words, the compressed file stores the same information in only 1% of the file size, the uncompressed one needs. This is really unlikely using real life data.

To show how unlikely it is we could take a look (in a popular scientific manner) on how compression works:

Let's have the string

"This is a test for compressing having long count of characters which always occurs the same sequence."

This needs 101 bytes. Let's say this string occurs 100,000 times in the file. Then uncompressed it would need 10,100,000 bytes. A compression algorithm would give that string a ID and would storing the string only once mapping it to that ID and would storing 100,000 times the ID where the string occurs in the file. That would need 101 bytes + 1 byte (ID) + 100,000 bytes (IDs) = 100,102 bytes. And this would have a ratio compressedBytes/uncompressedBytes of 0.009911089d for example.

So if we set the MinInflateRatio to lower than 0.01d, then we accept such unlikely data compression rates.

Also we can see, that the ratio compressedBytes/uncompressedBytes can only be 0 if compressedBytes is 0. But this would mean that there are no bytes to uncompress. So a MinInflateRatio of 0.0d can never be reached nor be undershot. So with a MinInflateRatio of 0.0d all possible ratios will be accepted.

Of course a MinInflateRatio of -1.0d also can never be reached nor be undershot. So using this also all possible ratios will be accepted.

like image 126
Axel Richter Avatar answered Dec 28 '22 08:12

Axel Richter