Can you pre-compress data files to be inserted into a zip file at a later time to improve performance?

Tags:

As part of our installer build, we have to zip thousands of large data files into about ten or twenty 'packages' with a few hundred (or even thousands of) files in each which are all dependent on being kept with the other files in the package. (They are versioned together if you will.)

Then during the actual install, the user selects which packages they want included on their system. This also lets them download updates to the packages from our site as one large, versioned file rather than asking them to download thousands of individual ones which could also lead to them being out of sync with others in the same package.

Since these are data files, some of them change regularly during the design and coding stages, meaning we then have to re-compress all files in that particular zip package, even if only one file has changed. This makes the packaging step of our installer build take well over an hour each time, with most of that going to re-compressing things that we haven't touched.

We've looked into leaving the zip packages alone, then replacing specific files inside them, but inserting and removing large files from the middle of a zip doesn't give us that much of a performance boost. (A little, but not enough that its worth it.)

I'm wondering if its possible to pre-process files down into a cached raw 'compressed state' that matches how it would be written to the zip package, but only the data itself, not the zip header info, etc.

My thinking is if that is possible, during our build step, we would first look for any data file that doesn't have a compressed cache associated with it, and if not, we would compress that file and write the result to the cache.

Next we would simply append all of the caches together in a file stream, adding any appropriate zip header needed for the files.

This would mean we are still recreating the entire zip during each build, but we are only recompressing data that has changed. The rest would just be written as-is which is very fast since it is a straight write-to-disk. And if a data file changes, its cache is destroyed, so next build-pass it would be recreated.

However, I'm not sure such a thing is possible. Is it, and if so, is there any documentation to show how one would go about attempting this?

863

asked Oct 18 '13 16:10

Mark A. Donohoe

1 Answers

Yes, that's possible. The most straightforward approach would be to zip each file individually into its own associated zip archive with one entry. When any file is modified, you replace its associated zip file to keep all of those up to date. Then you can write a simple program to take a set of those single entry zip files and merge them into a single zip file. You will need to refer to the documentation in the PKZip appnote. Take a look at that.

Now that you've read the appnote, what you need to do is use the local header, data, and central header from each individual zip file, write the local header and data as is sequentially to the new zip file, and save the central header and the offsets of the local headers in the new file. Then at the end of the new file save the current offset, write a new central directory using the central headers you saved, updating the offsets appropriately, and ending with a new end of central directory record with the offset of the start of the central directory.

Update:

I decided this was a useful enough thing to write. You can get it here.

answered Sep 27 '22 17:09

Mark Adler

Related questions
                            
                                How can I handle errors in Global.asax Application_Start?
                            
                                how use Google OAuth2 using ServiceAccount in .net?
                            
                                Add Service Reference: Ordering of Serialization Fields
                            
                                Binding custom objects (List<T>) that have Sublists to a Grid by pivoting the sublist
                            
                                Make Entity Framework use Contains instead of Like and explain 'ESCAPE ~'
                            
                                Generating a new SessionID on Login (ASP.NET)
                            
                                Ensuring that outgoing WCF requests are performed using a specific network interface
                            
                                HttpClient.DeleteAsync and Content.ReadAdStringAsync always return null
                            
                                Accessing properties of an anonymous types in C#?
                            
                                Can I insert a large text value into SQL Server from ASP.net without having the whole file in memory on the webserver?
                            
                                WP8 keyboard handling
                            
                                Git cant diff or merge .cs file in utf-16 encoding
                            
                                Putting a guard on a WPF event trigger. Is this possible?
                            
                                How to program Intel Xeon Phi with C#? [closed]
                            
                                How to Deploy C# .net application with MongoDB
                            
                                Can Process.HasExited be true for the current process?
                            
                                Why there is no Nullable<T>.Equals(T value) method? [closed]
                            
                                Windows forward packets to c# application
                            
                                Dependency property changed callback - multiple firing
                            
                                I need to compare two very large collections with potentially missing elements

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can you pre-compress data files to be inserted into a zip file at a later time to improve performance?

Tags:

c#

compression

zip

Mark A. Donohoe

People also ask

1 Answers

Mark Adler

Recent Activity

Donate For Us