I'm writing a custom backup script in bash for personal use. The goal is to compress the contents of a directory via tar/gzip, split the compressed archive, then upload the parts to AWS S3.
On my first try writing this script a few months ago, I was able to get it working via something like:
tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 - /mnt/SCRATCH/backup.tgz.part
aws s3 sync /mnt/SCRATCH/ s3://backups/ --delete
rm /mnt/SCRATCH/*
This worked well for my purposes, but required /mnt/SCRATCH
to have enough disk space to store the compressed directory. Now I wanted to improve this script to not have to rely on having enough space in /mnt/SCRATCH
, and did some research. I ended up with something like:
tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 --filter "aws s3 cp - s3://backups/backup.tgz.part" -
This almost works, but the target filename on my S3 bucket is not dynamic, and it seems to just overwrite the backup.tgz.part
file several times while running. The end result is just one 100MB file, vs the intended several 100MB files with endings like .part0001
.
Any guidance would be much appreciated. Thanks!
S3 does not support stream compression nor is it possible to compress the uploaded file remotely. If this is a one-time process I suggest downloading it to a EC2 machine in the same region, compress it there, then upload to your destination.
To upload folders and files to an S3 bucket Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that you want to upload your folders or files to. Choose Upload.
$ aws s3 cp myfolder s3://mybucket/myfolder --recursive upload: myfolder/file1. txt to s3://mybucket/myfolder/file1.txt upload: myfolder/subfolder/file1. txt to s3://mybucket/myfolder/subfolder/file1.txt ... A sync command makes it easy to synchronize the contents of a local folder with a copy in an S3 bucket.
when using split
you can use the env variable $FILE
to get the generated file name.
See split man page:
--filter=COMMAND
write to shell COMMAND; file name is $FILE
For your use case you could use something like the following:
--filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE'
(the single quotes are needed, otherwise the environment variable substitution will happen immediately)
Which will generate the following file names on aws:
backup.tgz.partx0000
backup.tgz.partx0001
backup.tgz.partx0002
...
Full example:
tar -czf - /mnt/STORAGE_0/dir_to_backup | split -b 100M -d -a 4 --filter 'aws s3 cp - s3://backups/backup.tgz.part$FILE' -
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With