I have critical data in an Amazon S3 bucket. I want to make a weekly backup of its other contents to another cloud service, or even inside S3. The best way would to sync my bucket to a new bucket inside a different region, in case of data loss.
How can I do that?
If you want to create an AWS S3 backup, you can use one of these methods: Enable AWS S3 versioning to preserve older versions of files that can be restored. Configure AWS S3 replication from one S3 bucket to another. Use the sync tool in AWS command-line interface (CLI) to copy files from AWS S3 to an EC2 instance.
Amazon S3 is natively integrated with AWS Backup, a fully managed, policy-based service that you can use to centrally define backup policies to protect your data in Amazon S3.
Sync objects The s3 sync command synchronizes the contents of a bucket and a directory, or the contents of two buckets. Typically, s3 sync copies missing or outdated files or objects between the source and target.
Depending on your use case, you can perform the data transfer between buckets using one of the following options: Run parallel uploads using the AWS Command Line Interface (AWS CLI) Use an AWS SDK. Use cross-Region replication or same-Region replication.
I prefer to backup locally using sync where only changes are updated. That is not the perfect backup solution but you can implement periodic updates later as you need:
s3cmd sync --delete-removed s3://your-bucket-name/ /path/to/myfolder/
If you never used s3cmd, install and configure it using:
pip install s3cmd s3cmd --configure
Also there should be S3 backup services for $5/month but I would also check Amazon Glacier which lets you put nearly 40 GB single archive file if you use multi-part upload.
http://docs.aws.amazon.com/amazonglacier/latest/dev/uploading-archive-mpu.html#qfacts
Remember, if your S3 account is compromised, you have chance to lose all of your data as you would sync an empty folder or malformed files. So, you better write a script to archive your backup few times, for e.g by detecting start of the week.
Update 01/17/2016:
Python based AWS CLI is very mature now.
Please use: https://github.com/aws/aws-cli
Example: aws s3 sync s3://mybucket .
This script backs up an S3 bucket:
#!/usr/bin/env python from boto.s3.connection import S3Connection import re import datetime import sys import time def main(): s3_ID = sys.argv[1] s3_key = sys.argv[2] src_bucket_name = sys.argv[3] num_backup_buckets = sys.argv[4] connection = S3Connection(s3_ID, s3_key) delete_oldest_backup_buckets(connection, num_backup_buckets) backup(connection, src_bucket_name) def delete_oldest_backup_buckets(connection, num_backup_buckets): """Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain.""" buckets = connection.get_all_buckets() # returns a list of bucket objects num_buckets = len(buckets) backup_bucket_names = [] for bucket in buckets: if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)): backup_bucket_names.append(bucket.name) backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date()) # The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1 delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1) if delete <= 0: return for i in range(0, delete): print 'Deleting the backup bucket, ' + backup_bucket_names[i] connection.delete_bucket(backup_bucket_names[i]) def backup(connection, src_bucket_name): now = datetime.datetime.now() # the month and day must be zero-filled new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day); print "Creating new bucket " + new_backup_bucket_name new_backup_bucket = connection.create_bucket(new_backup_bucket_name) copy_bucket(src_bucket_name, new_backup_bucket_name, connection) def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100): src_bucket = connection.get_bucket(src_bucket_name); dst_bucket = connection.get_bucket(dst_bucket_name); result_marker = '' while True: keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker) for k in keys: print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name t0 = time.clock() dst_bucket.copy_key(k.key, src_bucket_name, k.key) print time.clock() - t0, ' seconds' if len(keys) < maximum_keys: print 'Done backing up.' break result_marker = keys[maximum_keys - 1].key if __name__ =='__main__':main()
I use this in a rake task (for a Rails app):
desc "Back up a file onto S3" task :backup do S3ID = "AKIAJM3FAKEFAKENRWVQ" S3KEY = "0A5kuzV+F1pbaMjZxHQAZfakedeJd0dfakeNpry" SRCBUCKET = "primary-mzgd" NUM_BACKUP_BUCKETS = 2 Dir.chdir("#{Rails.root}/lib/tasks") system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}" end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With