Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Amazon Glacier mirror an Amazon S3 bucket?

I'd like to mirror an S3 bucket with Amazon Glacier.

The Glacier FAQ states:

Amazon S3 now provides a new storage option that enables you to utilize Amazon Glacier’s extremely low-cost storage service for data archiving. You can define S3 lifeycycle rules to automatically archive sets of Amazon S3 objects to Amazon Glacier to reduce your storage costs. You can learn more by visiting the Object Lifecycle Management topic in the Amazon S3 Developer Guide.

This is close, but I'd like to mirror. I do not want to delete the content on S3, only copy it to Glacier.

Is this possible to setup automatically with AWS?

Or does this mirroring need be uploaded to Glacier manually?

like image 987
Justin Tanner Avatar asked Mar 10 '13 18:03

Justin Tanner


People also ask

Is Glacier the same as S3?

Amazon S3 is a durable, secure, simple, and fast storage service, while Amazon S3 Glacier is used for archiving solutions. Use S3 if you need low latency or frequent access to your data. Use S3 Glacier for low storage cost, and you do not require millisecond access to your data.

How do I transfer data from S3 to Glacier?

Store the data in 2 "folders" in S3, "standard" and "glacier". Set a lifecycle policy to push all objects in the "glacier" folder to Glacier data storage ASAP. When you want to move an object from standard to glacier, copy it to the glacier folder and delete the object in the standard folder (there's no "move" API).

How do you replicate a S3 bucket?

Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that you want. Choose Management, scroll down to Replication rules, and then choose Create replication rule.


3 Answers

It is now possible to achieve an S3 to Glacier "mirror" by first creating a cross-region replication bucket on Amazon S3 (this replication bucket will be a mirror of your original bucket - see http://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html), then setting up a life-cycle rule (to move the data to Glacier) from within the replication bucket.

like image 116
Jordan Magnuson Avatar answered Oct 22 '22 04:10

Jordan Magnuson


Amazon doesn't offer this feature through its API. We had the same problem, and solved the problem by running a daily cron job that re-uploads files to Glacier.

Here is a snippet of code you can run using Python and boto to copy a file to a Glacier vault. Note that with the code below, you do have to download the file locally from S3 before you can run it (you can use s3cmd, for instance) - the following code is useful for uploading the local file to Glacier.

import boto

# Set up your AWS key and secret, and vault name
aws_key = "AKIA1234"
aws_secret = "ABC123"
glacierVault = "someName"

# Assumption is that this file has been downloaded from S3
fileName = "localfile.tgz"

try: 
  # Connect to boto
  l = boto.glacier.layer2.Layer2(aws_access_key_id=aws_key, aws_secret_access_key=aws_secret)

  # Get your Glacier vault
  v = l.get_vault(glacierVault)

  # Upload file using concurrent upload (so large files are OK)
  archiveID = v.concurrent_create_archive_from_file(fileName)

  # Append this archiveID to a local file, that way you remember what file
  # in Glacier corresponds to a local file. Glacier has no concept of files.
  open("glacier.txt", "a").write(fileName + " " + archiveID + "\n")
except:
  print "Could not upload gzipped file to Glacier"
like image 43
Suman Avatar answered Oct 22 '22 05:10

Suman


This is done via Lifecycle policy, but the object is not available in S3 anymore. You can duplicate it into separate bucket to keep it.

like image 32
Ahmed Al Hafoudh Avatar answered Oct 22 '22 05:10

Ahmed Al Hafoudh