Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decompress a Gzip archive in Java

Tags:

java

gzip

archive

I'm trying to decompress about 8000 files in gzip format in Java. My first try was to use GZIPInputStream but the performance was awful.

Anyone know any alternative to decompress gzip archives? I tried ZipInputStream but it's not recognizing the gzip format.

Thank you in advance.

like image 692
Rui Carneiro Avatar asked Mar 10 '09 17:03

Rui Carneiro


2 Answers

You need to use buffering. Writing small pieces of data is going to be inefficient. The compression implementation is in native code in the Sun JDK. Even if it wasn't the buffered performance should usually exceed reasonable file or network I/O.

OutputStream out = new BufferedOutputStream(new GZIPOutputStream(rawOut));

InputStream in = new BufferedInputStream(new GZIPInputStream(rawIn));

As native code is used to implement the decompression/compression algorithm, be very careful to close the stream (and not just the underlying stream) after use. I've found having loads of `Deflaters' hanging around is very bad for performance.

ZipInputStream deals with archives of files, which is a completely different thing from compressing a stream.

like image 134
Tom Hawtin - tackline Avatar answered Sep 22 '22 09:09

Tom Hawtin - tackline


When you say that GZipInputStream's performance was awful, could you be more specific? Did you find out whether it was a CPU bottleneck or an I/O bottleneck? Were you using buffering on both input and output? If you could post the code you were using, that would be very helpful.

If you're on a multi-core machine, you could try still using GZipInputStream but using multiple threads, one per core, with a shared queue of files still to process. (Any one file would only be processed by a single thread.) That might make things worse if you're I/O bound, but it may be worth a try.

like image 28
Jon Skeet Avatar answered Sep 26 '22 09:09

Jon Skeet