How to concat two or more gzip files/streams

Tags:

I want to concat two or more gzip streams without recompressing them.

I mean I have A compressed to A.gz and B to B.gz, I want to compress them to single gzip (A+B).gz without compressing once again, using C or C++.

Several notes:

Even you can just concat two files and gunzip would know how to deal with them, most of programs would not be able to deal with two chunks.
I had seen once an example of code that does this just by decompression of the files and then manipulating original and this significantly faster then normal re-compression, but still requires O(n) CPU operation.
Unfortunaly I can't found this example I had found once (concatenation using decompression only), if someone can point it I would be greatful.

Note: it is not duplicate of this because proposed solution is not fits my needs.

Clearification edit:

I want to concate several compressed HTML pices and send them to browser as one page, as per request: "Accept-Encoding: gzip", with respnse "Content-Encoding: gzip"

If the stream is concated as simple as cat a.gz b.gz >ab.gz, Gecko (firefox) and KHTML web engines gets only first part (a); IE6 does not display anything and Google Chrome displays first part (a) correctly and the second part (b) as garbage (does not decompress at all).

Only Opera handles this well.

So I need to create a single gzip stream of several chunks and send them without re-compressing.

Update: I had found gzjoin.c in the examples of zlib, it does it using only decompression. The problem is that decompression is still slower them simple memcpy.

It is still faster 4 times then fastest gzip compression. But it is not enough.

What I need is to find the data I need to save together with gzip file in order to not run decompression procedure, and how do I find this data during compression.

294

asked Jul 17 '09 13:07

Artyom

1 Answers

Look at the RFC1951 and RFC1952

The format is simply a suites of members, each composed of three parts, an header, data and a trailer. The data part is itself a set of chunks with each chunks having an header and data part.

To simulate the effect of gzipping the result of the concatenation of two (or more files), you simply have to adjust the headers (there is a last chunk flag for instance) and trailer correctly and copying the data parts.

There is a problem, the trailer has a CRC32 of the uncompressed data and I'm not sure if this one is easy to compute when you know the CRC of the parts.

Edit: the comments in the gzjoin.c file you found imply that, while it is possible to compute the CRC32 without decompressing the data, there are other things which need the decompression.

167

answered Oct 11 '22 00:10

AProgrammer

Related questions
                            
                                g++ always backward-compatible with "older" static libraries?
                            
                                Cache lines, false sharing and alignment
                            
                                Error Creating SSL Context - Qt
                            
                                What is pointer swizzling?
                            
                                Maximum number of errors while compiling with Clang
                            
                                Draw single Contour in OpenCV on image
                            
                                Is it possible to typedef a parameter pack?
                            
                                Retrieving the type of auto in C++11 without executing the program
                            
                                Passing class member to base class constructor (by reference)
                            
                                Codeblocks doesn't stop at breakpoints
                            
                                Using CRC32 algorithm to hash string at compile-time
                            
                                Comparator function in C++ meaning and working?
                            
                                Can lambdas translate into functions?
                            
                                C++ understanding multithreading with global variables
                            
                                Why does the void_t<> detection idiom not work with gcc-4.9? [duplicate]
                            
                                What is the difference between <chrono> and <ctime>?
                            
                                Map enum value to a type in C++
                            
                                What is the purpose of "{}" in "new int[5]{};"?
                            
                                Activate window
                            
                                Eclipse CDT: How to reference 3rd party includes via a Relative path

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to concat two or more gzip files/streams

Tags:

c++

concatenation

gzip

Artyom

People also ask

1 Answers

AProgrammer

Recent Activity

Donate For Us