How can I backup or sync an Amazon S3 bucket?

Tags:

I have critical data in an Amazon S3 bucket. I want to make a weekly backup of its other contents to another cloud service, or even inside S3. The best way would to sync my bucket to a new bucket inside a different region, in case of data loss.

How can I do that?

676

asked Aug 05 '12 14:08

VAAA

2 Answers

I prefer to backup locally using sync where only changes are updated. That is not the perfect backup solution but you can implement periodic updates later as you need:

s3cmd sync --delete-removed s3://your-bucket-name/ /path/to/myfolder/

If you never used s3cmd, install and configure it using:

pip install s3cmd s3cmd --configure

Also there should be S3 backup services for $5/month but I would also check Amazon Glacier which lets you put nearly 40 GB single archive file if you use multi-part upload.

http://docs.aws.amazon.com/amazonglacier/latest/dev/uploading-archive-mpu.html#qfacts

Remember, if your S3 account is compromised, you have chance to lose all of your data as you would sync an empty folder or malformed files. So, you better write a script to archive your backup few times, for e.g by detecting start of the week.

Update 01/17/2016:

Python based AWS CLI is very mature now.

Please use: https://github.com/aws/aws-cli
Example: aws s3 sync s3://mybucket .

133

answered Sep 28 '22 04:09

hurturk

This script backs up an S3 bucket:

#!/usr/bin/env python from boto.s3.connection import S3Connection import re import datetime import sys import time  def main():     s3_ID = sys.argv[1]     s3_key = sys.argv[2]     src_bucket_name = sys.argv[3]     num_backup_buckets = sys.argv[4]     connection = S3Connection(s3_ID, s3_key)     delete_oldest_backup_buckets(connection, num_backup_buckets)     backup(connection, src_bucket_name)  def delete_oldest_backup_buckets(connection, num_backup_buckets):     """Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain."""     buckets = connection.get_all_buckets() # returns a list of bucket objects     num_buckets = len(buckets)      backup_bucket_names = []     for bucket in buckets:         if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)):             backup_bucket_names.append(bucket.name)      backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date())      # The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1     delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1)     if delete <= 0:         return      for i in range(0, delete):         print 'Deleting the backup bucket, ' + backup_bucket_names[i]         connection.delete_bucket(backup_bucket_names[i])  def backup(connection, src_bucket_name):     now = datetime.datetime.now()     # the month and day must be zero-filled     new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day);     print "Creating new bucket " + new_backup_bucket_name     new_backup_bucket = connection.create_bucket(new_backup_bucket_name)     copy_bucket(src_bucket_name, new_backup_bucket_name, connection)   def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100):     src_bucket = connection.get_bucket(src_bucket_name);     dst_bucket = connection.get_bucket(dst_bucket_name);      result_marker = ''     while True:         keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker)          for k in keys:             print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name              t0 = time.clock()             dst_bucket.copy_key(k.key, src_bucket_name, k.key)             print time.clock() - t0, ' seconds'          if len(keys) < maximum_keys:             print 'Done backing up.'             break          result_marker = keys[maximum_keys - 1].key  if  __name__ =='__main__':main()

I use this in a rake task (for a Rails app):

desc "Back up a file onto S3" task :backup do      S3ID = "AKIAJM3FAKEFAKENRWVQ"      S3KEY = "0A5kuzV+F1pbaMjZxHQAZfakedeJd0dfakeNpry"      SRCBUCKET = "primary-mzgd"      NUM_BACKUP_BUCKETS = 2       Dir.chdir("#{Rails.root}/lib/tasks")      system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}" end

answered Sep 28 '22 03:09

Rose Perrone

Related questions
                            
                                MSTest - How to limit test result folders
                            
                                SHA 256 pseuedocode?
                            
                                New project: Python 2 or Python 3? [closed]
                            
                                Nested Generics: Why can't the compiler infer the type arguments in this case?
                            
                                iOS-like storyboard tool for Android project? [closed]
                            
                                How to add event listeners to objects in a svg?
                            
                                Why anonymous methods inside structs can not access instance members of 'this'
                            
                                TypeScript module import in nodejs
                            
                                arbitrary number of arguments in a python function
                            
                                Call an Oracle function from Java
                            
                                Is there a way to have an onload callback after changing window.location.href?
                            
                                Why can't diamond infer types on anonymous inner classes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With