Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working with Zip and GZip files in Java

It's been a while since I've done Java I/O, and I'm not aware of the latest "right" ways to work with Zip and GZip files. I don't necessarily need a full working demo - I'm primarily looking for the right interfaces and methods to be using. Yes, I could look up any random tutorial on this, but performance is an issue (these files can get pretty big) and I do care about using the best tool for the job.

The basic process I'll be implementing:

  • Download a bunch of files (that might be zipped, gzipped, or both) to a temp folder.
  • Add all the extracted files to a new zip file in a temp folder.

The input files might be compressed and archived more than once. For example, the "full extraction" should take any of the following inputs (I'm not in control of these), and leave behind foo.txt:

  • foo.txt.gz
  • foo.txt.zip
  • foo.txt.gz.zip
  • foo.txt.zip.gz
  • ...
  • foo.txt.gz.gz.gz.zip.gz.zip.zip.gz.gz
  • ...

Then, I might be left with foo.txt, bar.mp3, baz.exe - so I would just add them all to a new zip file with some generic name.

Questions:

  • With file size being a potential concern, which (interfaces/classes/methods) should I use to quickly:
    • extract zip files?
    • extract gzip files?
    • write zip files?
  • Am I better off keeping the individual extracted files in memory before writing back to the disk? Or,
  • Do potentially large files make that a bad idea?
like image 224
Matt Ball Avatar asked Sep 14 '10 17:09

Matt Ball


1 Answers

Don't hold all this uncompressed data in memory, or you might run out of heap space. You need to stream the data out to file when uncompressing and then stream it back in from file when you want to create your final zip file.

I haven't done zipped files before, but here is an example which shows how to uncompress a gzipped file:

import java.io.*;
import java.util.zip.*;

//unzipping a gzipped file
GZIPInputStream in = null;
OutputStream out = null;
try {
   in = new GZIPInputStream(new FileInputStream("file.txt.gz"));
   out = new FileOutputStream("file.txt");
   byte[] buf = new byte[1024 * 4];
   int len;
   while ((len = in.read(buf)) > 0) {
       out.write(buf, 0, len);
   }
}
catch (IOException e) {
   e.printStackTrace();
}
finally {
   if (in != null)
       try {
           in.close();
       }
       catch (IOException ignore) {
       }
   if (out != null)
       try {
           out.close();
       }
       catch (IOException ignore) {
       }
}
like image 149
dogbane Avatar answered Sep 19 '22 12:09

dogbane