Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast Concatenation of Multiple GZip Files

I have list of gzip files:

file1.gz file2.gz file3.gz 

Is there a way to concatenate or gzipping these files into one gzip file without having to decompress them?

In practice we will use this in a web database (CGI). Where the web will receive a query from user and list out all the files based on the query and present them in a batch file back to the user.

like image 337
neversaint Avatar asked Nov 04 '11 05:11

neversaint


People also ask

Can you concatenate gzip files?

Files compressed by gzip can be directly concatenated into larger gzipped files.

Can you cat Fastq GZ files?

If you combine gzipped fastq files with just cat , you will get gibberish downstream. this is wrong. 'Cat' is OK and faster. 'Cat'-ting a set of gzipped files will produce a concatenated gzipped fastq file.

How do I gzip multiple files into one GZ file in Linux?

If you want to compress multiple files or directory into one file, first you need to create a Tar archive and then compress the . tar file with Gzip. A file that ends in . tar.

Can gzip contain multiple files?

Gzip is not capable of compressing multiple files into one.


2 Answers

With gzip files, you can simply concatenate the files together, like so:

cat file1.gz file2.gz file3.gz > allfiles.gz 

Per the gzip RFC,

A gzip file consists of a series of "members" (compressed data sets). [...] The members simply appear one after another in the file, with no additional information before, between, or after them.

Note that this is not exactly the same as building a single gzip file of the concatenated data; among other things, all of the original filenames are preserved. However, gunzip seems to handle it as equivalent to a concatenation.

Since existing tools generally ignore the filename headers for the additional members, it's not easily possible to extract individual files from the result. If you want this to be possible, build a ZIP file instead. ZIP and GZIP both use the DEFLATE algorithm for the actual compression (ZIP supports some other compression algorithms as well as an option - method 8 is the one that corresponds to GZIP's compression); the difference is in the metadata format. Since the metadata is uncompressed, it's simple enough to strip off the gzip headers and tack on ZIP file headers and a central directory record instead. Refer to the gzip format specification and the ZIP format specification.

like image 176
bdonlan Avatar answered Sep 30 '22 12:09

bdonlan


Here is what man 1 gzip says about your requirement.

Multiple compressed files can be concatenated. In this case, gunzip will extract all members at once. For example:

gzip -c file1  > foo.gz gzip -c file2 >> foo.gz 

Then

gunzip -c foo 

is equivalent to

cat file1 file2 

Needless to say, file1 can be replaced by file1.gz.

You must notice this:

gunzip will extract all members at once

So to get all members individually, you will have to use something additional or write, if you wish to do so.

However, this is also addressed in man page.

If you wish to create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip. GNU tar supports the -z option to invoke gzip transparently. gzip is designed as a complement to tar, not as a replacement.

like image 37
Nehal Dattani Avatar answered Sep 30 '22 13:09

Nehal Dattani