It's my understanding Brotli
stores blocksize information in a meta-block header with only the final uncompressed size of the block, and no information about the compression length (9.2). I'm guessing that a wrapper would be need to be created in order to use it with multiple threads, or possibly something similar to Mark Adler's pigz
.
Would the same threading principles apply to Brotli as they do with gzip in this case, or are there any foreseeable issues to be aware of when it comes to multithreading implementations?
You can use the brotli format as is for this purpose. I got them to add the option of putting metadata in empty meta-blocks (where "empty" means that the meta-block produces zero uncompressed data). You can put markers in metadata to aid in finding meta-blocks. An inserted empty meta-block also starts the next meta-block at a byte boundary.
Each meta-block can be independent of the other meta-blocks. If the stream is constructed that way, then there is no issue with combining them when compressing or separately decompressing them. The areas of possible dependency are the ring buffer of the four last distances used, and backwards references past the beginning of the current meta-block. For parallel use, a meta-block can and must be constructed so as to not depend on the last four distances, not referring to the ring buffer until it has been filled with distances from the current meta-block. In addition, distances that reach back before the current meta-block would not be allowed (which includes no static references). Lastly you would append an empty or metadata meta-block to bring the sequence to a byte boundary for easy concatenation.
By the way, it looks like you're linking to an older version of the draft format. Here is a link to the current version.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With