Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I pipe a tar compression operation to aws s3 cp?

I'm writing a custom backup script in bash for personal use. The goal is to compress the contents of a directory via tar/gzip, split the compressed archive, then upload the parts to AWS S3.

On my first try writing this script a few months ago, I was able to get it working via something like:

tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 - /mnt/SCRATCH/backup.tgz.part
aws s3 sync /mnt/SCRATCH/ s3://backups/ --delete
rm /mnt/SCRATCH/*

This worked well for my purposes, but required /mnt/SCRATCH to have enough disk space to store the compressed directory. Now I wanted to improve this script to not have to rely on having enough space in /mnt/SCRATCH, and did some research. I ended up with something like:

tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 --filter "aws s3 cp - s3://backups/backup.tgz.part" -

This almost works, but the target filename on my S3 bucket is not dynamic, and it seems to just overwrite the backup.tgz.part file several times while running. The end result is just one 100MB file, vs the intended several 100MB files with endings like .part0001.

Any guidance would be much appreciated. Thanks!

like image 516
alonzoc1 Avatar asked Jul 17 '19 15:07

alonzoc1


People also ask

Does AWS S3 support compression?

S3 does not support stream compression nor is it possible to compress the uploaded file remotely. If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.

How do I upload a zip file to Amazon S3?

To upload folders and files to an S3 bucket Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that you want to upload your folders or files to. Choose Upload.

How do I upload files to Amazon S3 from Linux?

$ aws s3 cp myfolder s3://mybucket/myfolder --recursive upload: myfolder/file1. txt to s3://mybucket/myfolder/file1.txt upload: myfolder/subfolder/file1. txt to s3://mybucket/myfolder/subfolder/file1.txt ... A sync command makes it easy to synchronize the contents of a local folder with a copy in an S3 bucket.


1 Answers

when using split you can use the env variable $FILE to get the generated file name. See split man page:

--filter=COMMAND
     write to shell COMMAND; file name is $FILE

For your use case you could use something like the following:

--filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE'

(the single quotes are needed, otherwise the environment variable substitution will happen immediately)

Which will generate the following file names on aws:

backup.tgz.partx0000
backup.tgz.partx0001
backup.tgz.partx0002
...

Full example:

tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 --filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE' -
like image 97
Turtlefight Avatar answered Oct 29 '22 07:10

Turtlefight