Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I backup or sync an Amazon S3 bucket?

Tags:

I have critical data in an Amazon S3 bucket. I want to make a weekly backup of its other contents to another cloud service, or even inside S3. The best way would to sync my bucket to a new bucket inside a different region, in case of data loss.

How can I do that?

like image 676
VAAA Avatar asked Aug 05 '12 14:08

VAAA


People also ask

How do I backup my S3 bucket?

If you want to create an AWS S3 backup, you can use one of these methods: Enable AWS S3 versioning to preserve older versions of files that can be restored. Configure AWS S3 replication from one S3 bucket to another. Use the sync tool in AWS command-line interface (CLI) to copy files from AWS S3 to an EC2 instance.

Does AWS S3 have backup?

Amazon S3 is natively integrated with AWS Backup, a fully managed, policy-based service that you can use to centrally define backup policies to protect your data in Amazon S3.

Which command is used to sync for Amazon S3 bucket?

Sync objects The s3 sync command synchronizes the contents of a bucket and a directory, or the contents of two buckets. Typically, s3 sync copies missing or outdated files or objects between the source and target.

Can you copy data from one S3 bucket to another?

Depending on your use case, you can perform the data transfer between buckets using one of the following options: Run parallel uploads using the AWS Command Line Interface (AWS CLI) Use an AWS SDK. Use cross-Region replication or same-Region replication.


2 Answers

I prefer to backup locally using sync where only changes are updated. That is not the perfect backup solution but you can implement periodic updates later as you need:

s3cmd sync --delete-removed s3://your-bucket-name/ /path/to/myfolder/ 

If you never used s3cmd, install and configure it using:

pip install s3cmd s3cmd --configure 

Also there should be S3 backup services for $5/month but I would also check Amazon Glacier which lets you put nearly 40 GB single archive file if you use multi-part upload.

http://docs.aws.amazon.com/amazonglacier/latest/dev/uploading-archive-mpu.html#qfacts

Remember, if your S3 account is compromised, you have chance to lose all of your data as you would sync an empty folder or malformed files. So, you better write a script to archive your backup few times, for e.g by detecting start of the week.

Update 01/17/2016:

Python based AWS CLI is very mature now.

Please use: https://github.com/aws/aws-cli
Example: aws s3 sync s3://mybucket .

like image 133
hurturk Avatar answered Sep 28 '22 04:09

hurturk


This script backs up an S3 bucket:

#!/usr/bin/env python from boto.s3.connection import S3Connection import re import datetime import sys import time  def main():     s3_ID = sys.argv[1]     s3_key = sys.argv[2]     src_bucket_name = sys.argv[3]     num_backup_buckets = sys.argv[4]     connection = S3Connection(s3_ID, s3_key)     delete_oldest_backup_buckets(connection, num_backup_buckets)     backup(connection, src_bucket_name)  def delete_oldest_backup_buckets(connection, num_backup_buckets):     """Deletes the oldest backup buckets such that only the newest NUM_BACKUP_BUCKETS - 1 buckets remain."""     buckets = connection.get_all_buckets() # returns a list of bucket objects     num_buckets = len(buckets)      backup_bucket_names = []     for bucket in buckets:         if (re.search('backup-' + r'\d{4}-\d{2}-\d{2}' , bucket.name)):             backup_bucket_names.append(bucket.name)      backup_bucket_names.sort(key=lambda x: datetime.datetime.strptime(x[len('backup-'):17], '%Y-%m-%d').date())      # The buckets are sorted latest to earliest, so we want to keep the last NUM_BACKUP_BUCKETS - 1     delete = len(backup_bucket_names) - (int(num_backup_buckets) - 1)     if delete <= 0:         return      for i in range(0, delete):         print 'Deleting the backup bucket, ' + backup_bucket_names[i]         connection.delete_bucket(backup_bucket_names[i])  def backup(connection, src_bucket_name):     now = datetime.datetime.now()     # the month and day must be zero-filled     new_backup_bucket_name = 'backup-' + str('%02d' % now.year) + '-' + str('%02d' % now.month) + '-' + str(now.day);     print "Creating new bucket " + new_backup_bucket_name     new_backup_bucket = connection.create_bucket(new_backup_bucket_name)     copy_bucket(src_bucket_name, new_backup_bucket_name, connection)   def copy_bucket(src_bucket_name, dst_bucket_name, connection, maximum_keys = 100):     src_bucket = connection.get_bucket(src_bucket_name);     dst_bucket = connection.get_bucket(dst_bucket_name);      result_marker = ''     while True:         keys = src_bucket.get_all_keys(max_keys = maximum_keys, marker = result_marker)          for k in keys:             print 'Copying ' + k.key + ' from ' + src_bucket_name + ' to ' + dst_bucket_name              t0 = time.clock()             dst_bucket.copy_key(k.key, src_bucket_name, k.key)             print time.clock() - t0, ' seconds'          if len(keys) < maximum_keys:             print 'Done backing up.'             break          result_marker = keys[maximum_keys - 1].key  if  __name__ =='__main__':main() 

I use this in a rake task (for a Rails app):

desc "Back up a file onto S3" task :backup do      S3ID = "AKIAJM3FAKEFAKENRWVQ"      S3KEY = "0A5kuzV+F1pbaMjZxHQAZfakedeJd0dfakeNpry"      SRCBUCKET = "primary-mzgd"      NUM_BACKUP_BUCKETS = 2       Dir.chdir("#{Rails.root}/lib/tasks")      system "./do_backup.py #{S3ID} #{S3KEY} #{SRCBUCKET} #{NUM_BACKUP_BUCKETS}" end 
like image 34
Rose Perrone Avatar answered Sep 28 '22 03:09

Rose Perrone