I want to allow users to download an archive of multiple large files at once. However, the files and the archive may be too large to store in memory or on disk on my server (they are streamed in from other servers on the fly). I'd like to generate the archive as I stream it to the user.
I can use Tar or Zip or whatever is simplest. I am using django, which allows me to return a generator or file-like object in my response. This object could be used to pump the process along. However, I am having trouble figuring out how to build this sort of thing around the zipfile or tarfile libraries, and I'm afraid they may not support reading files as they go, or reading the archive as it is built.
This answer on converting an iterator to a file-like object might help. tarfile#addfile
takes an iterable, but it appears to immediately pass that to shutil.copyfileobj
, so this may not be as generator-friendly as I had hoped.
It's not possible to completely stream-write ZIP files. Small bits of metadata for each member file, such as its name, must be placed at the end of the ZIP. In order to do this, stream-zip buffers this metadata in memory until it can be output.
Archival programs are used often to back up data. You would use archives to backup a folder or a number of files into a single file and compress them as well. This allows you to save space and then store that individual file on a floppy or other removable media.
I ended up using SpiderOak ZipStream.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With