Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Backup: Amazon S3 or Glacier - lots of little files?

I'm trying to understand the complicated Amazon Glacier pricing model. I don't want to store a huge amount of data, a few GB's say 10. I hope never to download the files and if I did need to I don't care how long it takes.

Is there a cost per file I upload? Is it cheaper to zip lots of tiny files and upload in a few chunks or does 10,000 say images not matter? (cannot get a straight answer to this during searching)

Am I able to request the download of a whole Archive/Bucket or is it file-by-file?

like image 725
Markive Avatar asked Feb 01 '13 17:02

Markive


People also ask

Is S3 good for small files?

Small Files Create Too Much Latency For Data Analytics Since streaming data comes in small files, typically you write these files to S3 rather than combine them on write. But small files impede performance.

What is the difference between Amazon S3 and Amazon S3 Glacier?

Amazon S3 is a durable, secure, simple, and fast storage service, while Amazon S3 Glacier is used for archiving solutions. Use S3 if you need low latency or frequent access to your data. Use S3 Glacier for low storage cost, and you do not require millisecond access to your data.

Is S3 cheaper than Glacier?

As a storage solution for frequent utilization, Amazon S3 is more expensive than Glacier. Pricing starts at three cents per gigabyte (GB).


1 Answers

I know this is a bit old, but you may still find my answer helpful (I hope). The other answer is based on S3 which wasn't your question I believe.

Glacier is intended for rare file access. Having that in mind they sort of punish you if you need to retrieve many files at once. In your particular case I would suggest uploading 10.000 separate files instead of let's say 100 ZIP files with 100 files each. The reason is very simple. Glacier will let you download for free only 5% of the total archive and is prorated daily. So if, for example, you need to download 10 photos you took on a weekend, you would be able to get those 10 photos for free if they are spread in the vault. On the other hand, if you have a ZIP file that has 100 photos inside, you'll be forced to download that zip that will probably be more than 5% of the total archive meaning you'll be paying some fees for the retrieval.

The only reason it makes sense to upload fewer files is to avoid high upload requests (10.000 files usually mean 10.000 requests). Requests are charged $0,05 per 1000. This fees are much lower that retrieval fees (taking into account the limits imposed), that's why I would always recommend uploading separate files. Of course you may zip files that make sense to be together.

Retrieval costs are very complex in Amazon Glacier. They have a good explanation here: http://aws.amazon.com/glacier/faqs/#How_much_data_can_I_retrieve_for_free But even there you'll need to pay attention on the calculations to get a clear idea on how costs are billed.

Regarding this question: Am I able to request the download of a whole Archive/Bucket or is it file-by-file?

Requests are by file-by-file, although you can select many files at once and download them altogether.

Deciding whether to use S3 or Glacier really depends on your needs on file access. If you will rearly need access to your files then Glacier is your answer. Otherwise for 10GB S3 can still be cheap and be more flexible than Glacier. In my case I find family photos to be a very precious thing. That's why I have a 100GB backup on glacier with all my family photos. I don't intend to access it unless there is some kind of disaster at home. In that case, I think I would not mind the retrieval cost if that saved something I really care about. But that's just me.

like image 140
Sirkong Avatar answered Sep 28 '22 21:09

Sirkong