I was assigned to parallelize GZip in Java 7, and I am not sure which is possible.
The assignment is:
What I have tried:
I am not sure if I am did my approaches wrong or took the wrong approaches completely. Can anyone point me the right direction for which classes to use for this project?
A correct parallel implementation which does exactly what you ask is here:
https://github.com/shevek/parallelgzip
Yep, zipping a file with dictionary cannt be parallelized, as everything depends on everything. Maybe your teacher asked you to parallelize the individual gzipping of multiple files in a folder? That would be a great example of parallelized work.
To make a process concurrent, you need to have portions of code which can run concurrently and independently. Most compression algorithms are designed to be run sequentially, where every byte depends on every byte has comes before.
The only way to do compression concurrently is to change the algorythm (making it incompatible with existing approaches)
I think you can do it by inserting appropriate resets in the compression stream. The idea is that the underlying compression engine used in gzip allows the deflater to be reset, with an aim that it makes it easier to recover from stream corruption, though at a cost of making the compression ratio worse. After reset, the deflater will be in a known state and thus you could in fact start from that state (which is independent of the content being compressed) in multiple threads (and from many locations in the input data, of course) produce a compressed chunk and include the data produced when doing the following reset so that it takes the deflater back to the known state. Then you've just to reassemble the compressed pieces into the overall compressed stream. “Simple!” (Hah!)
I don't know if this will work, and I suspect that the complexity of the whole thing will make it not a viable choice except when you're compressing single very large files. (If you had many files, it would be much easier to just compress each of those in parallel.) Still, that's what I'd try first.
(Also note that the gzip format is just a deflated stream with extra metadata.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With