I am unzipping a huge gz file in java, the gz file is about 2 gb and the unzipped file is about 6 gb. from time to time it the unzipping process would take forever(hours), sometimes it finishes in reasonable time(like under 10 min or quicker).
I have a fairly powerful box(8GB ram, 4-cpu), is there a way to improve the code below? or use a completely different library?
Also I used Xms256m and Xmx4g to the vm.
public static File unzipGZ(File file, File outputDir) {
GZIPInputStream in = null;
OutputStream out = null;
File target = null;
try {
// Open the compressed file
in = new GZIPInputStream(new FileInputStream(file));
// Open the output file
target = new File(outputDir, FileUtil.stripFileExt(file.getName()));
out = new FileOutputStream(target);
// Transfer bytes from the compressed file to the output file
byte[] buf = new byte[1024];
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
// Close the file and stream
in.close();
out.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if (out != null) {
try {
out.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
return target;
}
I don't know how much buffering is applied by default, if any - but you might want to try wrapping both the input and output in a BufferedInputStream / BufferedOutputStream. You could also try increasing your buffer size - 1K is a pretty small buffer. Experiment with different sizes, e.g. 16K, 64K etc. These should make the use of BufferedInputStream rather less important, of course.
On the other hand, I suspect this isn't really the problem. If it sometimes finishes in 10 minutes and sometimes takes hours, that suggests something very odd is going on. When it takes a very long time, is it actually making progress? Is the output file increasing in size? Is it using significant CPU? Is the disk constantly in use?
One side note: as you're closing in and out in finally blocks, you don't need to do it in the try block as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With