What compression/archive formats support inter-file compression?

Question

This question on archiving PDF's got me wondering -- if I wanted to compress (for archival purposes) lots of files which are essentially small changes made on top of a master template (a letterhead), it seems like huge compression gains can be had with inter-file compression.

Do any of the standard compression/archiving formats support this? AFAIK, all the popular formats focus on compressing each single file.

CesarB · Accepted Answer

Several formats do inter-file compression.

The oldest example is .tar.gz; a .tar has no compression but concatenates all the files together, with headers before each file, and a .gz can compress only one file. Both are applied in sequence, and it's a traditional format in the Unix world. .tar.bz2 is the same, only with bzip2 instead of gzip.

More recent examples are formats with optional "solid" compression (for instance, RAR and 7-Zip), which can internally concatenate all the files before compressing, if enabled by a command-line flag or GUI option.

Edward Kmett · Answer

Take a look at google's open-vcdiff.

http://code.google.com/p/open-vcdiff/

It is designed for calculating small compressed deltas and implements RFC 3284.

http://www.ietf.org/rfc/rfc3284.txt

Microsoft has an API for doing something similar, sans any semblance of a standard.

In general the algorithms you are looking for are ones based on Bentley/McIlroy:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.8470

In particular these algorithms will be a win if the size of the template is larger than the window size (~32k) used by gzip or the block size (100-900k) used by bzip2.

They are used by Google internally inside of their BIGTABLE implementation to store compressed web pages for much the same reason you are seeking them.

What compression/archive formats support inter-file compression?

Tags:

compression

archive

Toybuilder

2 Answers

CesarB

Edward Kmett

Recent Activity

Donate For Us

What compression/archive formats support inter-file compression?

Tags:

compression

archive

Toybuilder

2 Answers

CesarB

Edward Kmett

Related questions

Recent Activity

Donate For Us