Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can zip files be sparse/non-contiguous?

The zip file format ends with a central directory section that then points to the individual zip entries within the file. This appears to allow zip entries to occur anywhere within the zip file itself. Indeed, self-extracting zip files are a good example: they start with an executable and all the zip entries occur after the executable bytes.

The question is: does the zip file format really allow sparse or non-contiguous zip entries? e.g. if there are empty or otherwise unaccounted bytes between zip entries? Both the definitive PK note and wikipedia article seem to allow this. Will all/most typical zip utilities work with such sparse zip files?

The use case is this: I want to be able to delete or replace zip entries in a zip file. To do this, the typical minizip etc. libraries want you to copy out the entire zip file while not copying out the deleted or replaced zip entry, which seems wasteful and slow.

Wouldn't it be better to over-allocate, say 1.5x the storage for an entry, then when deleting or replacing an entry you could figure out where the unallocated bytes were and use those directly? Using 1.5x the storage means that if the zip entry grew linearly, the reallocations should also happen amortized linearly. It would be similar to file system block allocation though probably not as sophisticated.

This also helps with a lot of the zip-based file formats out there. Instead of having to have some temp directory somewhere (or even in memory) with the temporarily unzipped files for editing/changing and then having to rezip the lot back into the file format, this would lessen the need for rezipping and rewriting portions of the zip file.

Are there any C/C++ libraries out there that do this?

like image 282
Glen Low Avatar asked Sep 12 '12 10:09

Glen Low


People also ask

Can a ZIP file be empty?

If you are getting a message that the zip file is empty when you try to extract the files, it probably means the file was corrupted during download. This can sometimes happen when you are using certain browser versions that handle file downloads differently from most other browsers.

What's the difference between a zipped folder and a normal one?

ZIP files work in much the same way as a standard folder on your computer. They contain data and files together in one place. But with zipped files, the contents are compressed, which reduces the amount of data used by your computer. Another way to describe ZIP files is as an archive.

Can a ZIP file contain multiple files?

Microsoft Windows provides a utility that allows you to zip multiple files into a single compressed file format. This is especially helpful if you are emailing files as attachments or if you need to conserve space (zipping files can reduce file size by up to 50%).

Does NTFS support sparse files?

Support for sparse files is introduced in the NTFS file system as another way to make disk space usage more efficient. When sparse file functionality is enabled, the system does not allocate hard disk drive space to a file except in regions where it contains nonzero data.


1 Answers

No. Reading the central directory is optional. zip decoders can, and some do, simply read the zip file sequentially from the beginning, expecting to see the local headers and entry data contiguously. They can complete the job of decoding, never having even looked at the central directory.

In order to do what you want, you would need to put in dummy zip entries between the useful entries in order to hold that space. At least if you want to be compatible with the rest of the zip world.

like image 98
Mark Adler Avatar answered Oct 06 '22 00:10

Mark Adler