I have noticed that the unzip facility in Java is extremely slow compared to using a native tool such as WinZip.
Is there a third party library available for Java that is more efficient? Open Source is preferred.
Edit
Here is a speed comparison using the Java built-in solution vs 7zip. I added buffered input/output streams in my original solution (thanks Jim, this did make a big difference).
Zip File size: 800K Java Solution: 2.7 seconds 7Zip solution: 204 ms
Here is the modified code using the built-in Java decompression:
/** Unpacks the give zip file using the built in Java facilities for unzip. */
@SuppressWarnings("unchecked")
public final static void unpack(File zipFile, File rootDir) throws IOException
{
ZipFile zip = new ZipFile(zipFile);
Enumeration<ZipEntry> entries = (Enumeration<ZipEntry>) zip.entries();
while(entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
java.io.File f = new java.io.File(rootDir, entry.getName());
if (entry.isDirectory()) { // if its a directory, create it
continue;
}
if (!f.exists()) {
f.getParentFile().mkdirs();
f.createNewFile();
}
BufferedInputStream bis = new BufferedInputStream(zip.getInputStream(entry)); // get the input stream
BufferedOutputStream bos = new BufferedOutputStream(new java.io.FileOutputStream(f));
while (bis.available() > 0) { // write contents of 'is' to 'fos'
bos.write(bis.read());
}
bos.close();
bis.close();
}
}
Zip4j is written on JDK 8, as some of the features (NIO) that Zip4j supports requires features available only in JDK 8. However, considering the fact that Zip4j is widely used in Android, and to support older versions of Android, Zip4j supports JDK 7 as well.
Methods. getComment(): String – returns the zip file comment, or null if none. getEntry(String name): ZipEntry – returns the zip file entry for the specified name, or null if not found. getInputStream(ZipEntry entry) : InputStream – Returns an input stream for reading the contents of the specified zip file entry.
The problem is not the unzipping, it's the inefficient way you write the unzipped data back to disk. My benchmarks show that using
InputStream is = zip.getInputStream(entry); // get the input stream
OutputStream os = new java.io.FileOutputStream(f);
byte[] buf = new byte[4096];
int r;
while ((r = is.read(buf)) != -1) {
os.write(buf, 0, r);
}
os.close();
is.close();
instead reduces the method's execution time by a factor of 5 (from 5 to 1 second for a 6 MB zip file).
The likely culprit is your use of bis.available()
. Aside from being incorrect (available returns the number of bytes until a call to read would block, not until the end of the stream), this bypasses the buffering provided by BufferedInputStream, requiring a native system call for every byte copied into the output file.
Note that wrapping in a BufferedStream is not necessary if you use the bulk read and write methods as I do above, and that the code to close the resources is not exception safe (if reading or writing fails for any reason, neither is
nor os
would be closed). Finally, if you have IOUtils in the class path, I recommend using their well tested IOUtils.copy
instead of rolling your own.
Make sure you are feeding the unzip method a BufferedInputStream in your Java application. If you have made the mistake of using an unbuffered input stream your IO performance is guaranteed to suck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With