I'm trying to create a gzipped tar file without it taking up a lot of RAM. The Bash equivalent of what I want to do is:
tar -cf - -C $INPUT . | gzip -cv - > $OUTPUT
I'm using the tar and flate2 libraries, which both say they support streaming. I cannot figure out how to stream one into the other. I have tried looking at the Write
implementors, but do not see a type of stream that fits my needs.
My current implementation has the desired output (namely a .tar.gz file), but it uses up a lot of RAM, especially when the file size is large. The created file also gives "tar: Unexpected EOF in archive" when the input size is large, but is fine with small inputs. This indicates to me that it is not piping the streams as Bash would.
use flate2::write::GzEncoder;
use flate2::Compression;
use std::fs::File;
use tar::Builder;
// Create tar archive
let mut archive = Builder::new(Vec::new());
archive.append_dir_all("myfiles", "myfiles")?;
// Gzip tar archive and write to file
let compressed_file = File::create("backup.tar.gz")?;
let mut encoder = GzEncoder::new(compressed_file, Compression::Default);
encoder.write(&archive.into_inner()?)?;
encoder.finish()?;
To understand why you are using RAM and why tar
reports an error for large files, let's understand what exactly your code is doing:
let mut archive = Builder::new(Vec::new());
Looking at the Builder::new
documentation, we can already see the main problem: "Create a new archive builder with the underlying object as the destination of all data written". Since you are passing a Vec
(which implements Write
), the destination of all the tar-compressed data will be written into the vector. But the vector is stored in RAM.
archive.append_dir_all("myfiles", "myfiles")?;
This line already compresses the files into the vector, so in this line, the RAM fills up.
Skipping a few lines:
encoder.write(&archive.into_inner()?)?;
Here you tell the encoder to write the vector you just filled. But, it is important to remember, that Write::write()
has no guarantee how much data is written! It is a lower level building block for higher level functions which are more reliable. You want to use write_all()
instead which will repeatedly call write()
until all data is written. So since you're just using write()
, only a part of the data is written. When you have very little data, it can usually be written all at once, but once you have more data, the bug becomes noticeable.
So what to do instead? Simple: the Builder::new()
expects something that implements Write
and uses that as destination. But your tar
encoder
does implement Write
. Thus, this should work:
// Create Gzip file
let compressed_file = File::create("backup.tar.gz")?;
let mut encoder = GzEncoder::new(compressed_file, Compression::Default);
{
// Create tar archive and compress files
let mut archive = Builder::new(&mut encoder);
archive.append_dir_all("myfiles", "myfiles")?;
}
// Finish Gzip file
encoder.finish()?;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With