Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.util.zip - ZipInputStream v.s. ZipFile

I have some general questions regarding the java.util.zip library. What we basically do is an import and an export of many small components. Previously these components were imported and exported using a single big file, e.g.:

<component-type-a id="1"/>
<component-type-a id="2"/>
<component-type-a id="N"/>

<component-type-b id="1"/>
<component-type-b id="2"/>
<component-type-b id="N"/>

Please note that the order of the components during import is relevant.

Now every component should occupy its own file which should be externally versioned, QA-ed, bla, bla. We decided that the output of our export should be a zip file (with all these files in) and the input of our import should be a similar zip file. We do not want to explode the zip in our system. We do not want opening separate streams for each of the small files. My current questions:

Q1. May the ZipInputStream guarantee that the zip entries (the little files) will be read in the same order in which they were inserted by our export that uses ZipOutputStream? I assume reading is something like:


ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null) 
{
       //read from zis until available
}

I know that the central zip directory is put at the end of the zip file but nevertheless the file entries inside have sequential order. I also know that relying on the order is an ugly idea but I just want to have all the facts in mind.

Q2. If I use ZipFile (which I prefer) what is the performance impact of calling getInputStream() hundreds of times? Will it be much slower than the ZipInputStream solution? The zip is opened only once and ZipFile is backed by RandomAccessFile - is this correct? I assume reading is something like:


ZipFile zipfile = new ZipFile(argv[0]);
Enumeration e = zipfile.entries();//TODO: assure the order of the entries
while(e.hasMoreElements()) {
        entry = (ZipEntry) e.nextElement();
        is = zipfile.getInputStream(entry));
}

Q3. Are the input streams retrieved from the same ZipFile thread safe (e.g. may I read different entries in different threads simultaneously)? Any performance penalties?

Thanks for your answers!

like image 671
Lachezar Balev Avatar asked Jan 11 '11 17:01

Lachezar Balev


People also ask

What is ZipInputStream in Java?

public class ZipInputStream extends InflaterInputStream. This class implements an input stream filter for reading files in the ZIP file format. Includes support for both compressed and uncompressed entries.

Can Java read ZIP files?

Java API provides extensive support to read Zip files, all classes related to zip file processing are located in the java. util. zip package. One of the most common tasks related to zip archive is to read a Zip file and display what entries it contains, and then extract them in a folder.

How do I read a ZipInputStream file?

ZipInputStream. read(byte[] buf, int off, int len) method reads from the current ZIP entry into an array of bytes. If len is not zero, the method blocks until some input is available; otherwise, no bytes are read and 0 is returned.

How can I read the content of a zip file without unzipping it in Java?

Methods. getComment(): String – returns the zip file comment, or null if none. getEntry(String name): ZipEntry – returns the zip file entry for the specified name, or null if not found. getInputStream(ZipEntry entry) : InputStream – Returns an input stream for reading the contents of the specified zip file entry.


1 Answers

Q1: yes, order will be the same in which entries were added.

Q2: note that due to structure of zip archive files, and compression, none of solutions is exactly streaming; they all do some level of buffering. And if you check out JDK sources, implementations share most code. There is no real random access to within content, although index does allow finding chunks that correspond to entries. So I think there should not be meaningful performance differences; especially as OS will do caching of disk blocks anyway. You may want to just test performance to verify this with a simple test case.

Q3: I would not count on this; and most likely they aren't. If you really think concurrent access would help (mostly because decompression is CPU bound, so it might help), I'd try reading the whole file in memory, expose via ByteArrayInputStream, and construct multiple independent readers.

like image 105
StaxMan Avatar answered Sep 24 '22 21:09

StaxMan