I'm trying to decompress about 8000 files in gzip format in Java. My first try was to use GZIPInputStream but the performance was awful.
Anyone know any alternative to decompress gzip archives? I tried ZipInputStream but it's not recognizing the gzip format.
Thank you in advance.
You need to use buffering. Writing small pieces of data is going to be inefficient. The compression implementation is in native code in the Sun JDK. Even if it wasn't the buffered performance should usually exceed reasonable file or network I/O.
OutputStream out = new BufferedOutputStream(new GZIPOutputStream(rawOut));
InputStream in = new BufferedInputStream(new GZIPInputStream(rawIn));
As native code is used to implement the decompression/compression algorithm, be very careful to close the stream (and not just the underlying stream) after use. I've found having loads of `Deflaters' hanging around is very bad for performance.
ZipInputStream
deals with archives of files, which is a completely different thing from compressing a stream.
When you say that GZipInputStream
's performance was awful, could you be more specific? Did you find out whether it was a CPU bottleneck or an I/O bottleneck? Were you using buffering on both input and output? If you could post the code you were using, that would be very helpful.
If you're on a multi-core machine, you could try still using GZipInputStream
but using multiple threads, one per core, with a shared queue of files still to process. (Any one file would only be processed by a single thread.) That might make things worse if you're I/O bound, but it may be worth a try.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With