Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue with decoding LZMA compress zip file in java using apache common compress/org.tukaani.xz

Tags:

Getting org.tukaani.xz.UnsupportedOptionsException: Uncompressed size is too big error while trying to decode LZMA compress xls file. Whereas non LZMA files getting unpack/decode without any issue. Both the cases same xls file being compressed.

I am using Apache commons compress and org.tukaani.xz.

sample code for reference

package com.concept.utilities.zip;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;

import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipFile;
import org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream;

public class ApacheComm {

    public void extractLZMAZip(File zipFile, String compressFileName, String destFolder) {

        ZipFile zip = null;
        try {

            zip = new ZipFile(zipFile);
            ZipArchiveEntry zipArchiveEntry = zip.getEntry(compressFileName);
            if (null != zipArchiveEntry) {
                String name = zipArchiveEntry.getName();

                // InputStream is = zip.getInputStream(zipArchiveEntry);
                InputStream israw = zip.getRawInputStream(zipArchiveEntry);

                LZMACompressorInputStream lzma = new LZMACompressorInputStream(israw);
            }

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (null != zip)
                ZipFile.closeQuietly(zip);
        }
    }

    public static void main(String[] args) throws IOException {

        ApacheComm c = new ApacheComm();
        try {
            c.extractLZMAZip(new File("H:\\archives\\rollLZMA.zip"), "roll.xls", "H:\\archives\\");
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

}

Error

org.tukaani.xz.UnsupportedOptionsException: Uncompressed size is too big
    at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source)
    at org.tukaani.xz.LZMAInputStream.<init>(Unknown Source)
    at org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream.<init>(LZMACompressorInputStream.java:50)
    at com.concept.utilities.zip.ApacheComm.extractLZMAZip(ApacheComm.java:209)
    at com.concept.utilities.zip.ApacheComm.main(ApacheComm.java:224)

Am I missing something? Is there any other way I can decode zip file with compression method = LZMA

like image 924
Raj Avatar asked Jul 20 '17 11:07

Raj


1 Answers

The reason your code isn't working, is that Zip LZMA compressed data segments have a different header compared to normal compressed LZMA files.

You can read the specifications at https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT (4.4.4 general purpose bit flag, 5.8 LZMA - Method 14), but to quote the important part:

5.8.5 [...] The LZMA Compressed Data Segment will consist of an LZMA Properties Header followed by the LZMA Compressed Data as shown:

[LZMA properties header for file 1]
[LZMA compressed data for file 1]

[...]

5.8.8 Storage fields for the property information within the LZMA Properties Header are as follows:

LZMA Version Information 2 bytes
LZMA Properties Size 2 bytes
LZMA Properties Data variable, defined by "LZMA Properties Size"

5.8.8.1 LZMA Version Information - this field identifies which version of the LZMA SDK was used to compress a file. The first byte will store the major version number of the LZMA SDK and the second byte will store the minor number.

5.8.8.2 LZMA Properties Size - this field defines the size of the remaining property data. Typically this size SHOULD be determined by the version of the SDK. This size field is included as a convenience and to help avoid any ambiguity arising in the future due to changes in this compression algorithm.

5.8.8.3 LZMA Property Data - this variable sized field records the required values for the decompressor as defined by the LZMA SDK. The data stored in this field SHOULD be obtained using the WriteCoderProperties() in the version of the SDK defined by the "LZMA Version Information" field.

Code sample:

import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipFile;
import org.apache.commons.compress.archivers.zip.ZipMethod;
import org.apache.commons.io.IOUtils;
import org.tukaani.xz.LZMAInputStream;

import java.io.IOException;
import java.io.InputStream;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;

public class ApacheComm
{
    public InputStream getInputstreamForEntry(ZipFile zipFile, ZipArchiveEntry ze) throws IOException
    {
        if (zipFile.canReadEntryData(ze))
        {
            return zipFile.getInputStream(ze);
        } else if (ze.getMethod() == ZipMethod.LZMA.getCode()) {
            InputStream inputStream = zipFile.getRawInputStream(ze);
            ByteBuffer buffer = ByteBuffer.wrap(IOUtils.readFully(inputStream, 9))
                    .order(ByteOrder.LITTLE_ENDIAN);

            // Lzma sdk version used to compress this data
            int majorVersion = buffer.get();
            int minorVersion = buffer.get();

            // Byte count of the following data represent as an unsigned short.
            // Should be = 5 (propByte + dictSize) in all versions
            int size = buffer.getShort() & 0xffff;
            if (size != 5)
                throw new UnsupportedOperationException();

            byte propByte = buffer.get();

            // Dictionary size is an unsigned 32-bit little endian integer.
            int dictSize = buffer.getInt();

            long uncompressedSize;
            if ((ze.getRawFlag() & (1 << 1)) != 0)
            {
                // If the entry uses EOS marker, use -1 to indicate
                uncompressedSize = -1;
            } else {
                uncompressedSize = ze.getSize();
            }

            return new LZMAInputStream(inputStream, uncompressedSize, propByte, dictSize);
        } else {
            throw new UnsupportedOperationException();
        }
    }
}
like image 111
Rihi Avatar answered Oct 01 '22 05:10

Rihi