Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Poor Performance of Java's unzip utilities

Tags:

java

unzip

I have noticed that the unzip facility in Java is extremely slow compared to using a native tool such as WinZip.

Is there a third party library available for Java that is more efficient? Open Source is preferred.

Edit

Here is a speed comparison using the Java built-in solution vs 7zip. I added buffered input/output streams in my original solution (thanks Jim, this did make a big difference).

Zip File size: 800K Java Solution: 2.7 seconds 7Zip solution: 204 ms

Here is the modified code using the built-in Java decompression:

/** Unpacks the give zip file using the built in Java facilities for unzip. */
@SuppressWarnings("unchecked")
public final static void unpack(File zipFile, File rootDir) throws IOException
{
  ZipFile zip = new ZipFile(zipFile);
  Enumeration<ZipEntry> entries = (Enumeration<ZipEntry>) zip.entries();
  while(entries.hasMoreElements()) {
    ZipEntry entry = entries.nextElement();
    java.io.File f = new java.io.File(rootDir, entry.getName());
    if (entry.isDirectory()) { // if its a directory, create it
      continue;
    }

    if (!f.exists()) {
      f.getParentFile().mkdirs();
      f.createNewFile();
    }

    BufferedInputStream bis = new BufferedInputStream(zip.getInputStream(entry)); // get the input stream
    BufferedOutputStream bos = new BufferedOutputStream(new java.io.FileOutputStream(f));
    while (bis.available() > 0) {  // write contents of 'is' to 'fos'
      bos.write(bis.read());
    }
    bos.close();
    bis.close();
  }
}
like image 596
Tony Avatar asked Jul 23 '10 19:07

Tony


People also ask

What is a good Java library to ZIP unzip files?

Zip4j is written on JDK 8, as some of the features (NIO) that Zip4j supports requires features available only in JDK 8. However, considering the fact that Zip4j is widely used in Android, and to support older versions of Android, Zip4j supports JDK 7 as well.

How can I read the content of a zip file without unzipping it in Java?

Methods. getComment(): String – returns the zip file comment, or null if none. getEntry(String name): ZipEntry – returns the zip file entry for the specified name, or null if not found. getInputStream(ZipEntry entry) : InputStream – Returns an input stream for reading the contents of the specified zip file entry.


2 Answers

The problem is not the unzipping, it's the inefficient way you write the unzipped data back to disk. My benchmarks show that using

    InputStream is = zip.getInputStream(entry); // get the input stream
    OutputStream os = new java.io.FileOutputStream(f);
    byte[] buf = new byte[4096];
    int r;
    while ((r = is.read(buf)) != -1) {
      os.write(buf, 0, r);
    }
    os.close();
    is.close();

instead reduces the method's execution time by a factor of 5 (from 5 to 1 second for a 6 MB zip file).

The likely culprit is your use of bis.available(). Aside from being incorrect (available returns the number of bytes until a call to read would block, not until the end of the stream), this bypasses the buffering provided by BufferedInputStream, requiring a native system call for every byte copied into the output file.

Note that wrapping in a BufferedStream is not necessary if you use the bulk read and write methods as I do above, and that the code to close the resources is not exception safe (if reading or writing fails for any reason, neither is nor os would be closed). Finally, if you have IOUtils in the class path, I recommend using their well tested IOUtils.copy instead of rolling your own.

like image 85
meriton Avatar answered Oct 08 '22 10:10

meriton


Make sure you are feeding the unzip method a BufferedInputStream in your Java application. If you have made the mistake of using an unbuffered input stream your IO performance is guaranteed to suck.

like image 3
Jim Tough Avatar answered Oct 08 '22 09:10

Jim Tough