Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS S3 Standard Infrequent Access vs Reduced Redundancy storage class when coupled with CloudFront?

I'm using CloudFront to cache and distribute all of my thumbnails currently stored on S3 in Standard storage class. Since CloudFront caches originals and accesses them only every 24 hours, it makes sense to use a cheaper storage class than Standard: either Standard Infrequent Access (IA) or Reduced Redundancy (RR). But I'm not sure which one would be more suitable and cost effective.

Standard-IA has the cheapest storage among all (58% cheaper than Standard class and 47% cheaper than RR), but 60% more expensive requests than both Standard and RR. However, all files under 128kb stored in Standard-IA class are rounded to 128kb when calculating cost, which would apply to most of my thumbnail images.

Meanwhile, storage in RR class is only 20% cheaper than Standard, but its request cost is 60% cheaper than that of Standard-IA.

I'm unsure which one would be most cost effective in practice and would appreciate anyone with experience using both to give some feedback.

like image 529
Sbbs Avatar asked Dec 09 '15 00:12

Sbbs


People also ask

What is the benefit of using the S3 standard infrequent access storage class?

Infrequent access S3 Standard-IA offers the high durability, high throughput, and low latency of S3 Standard, with a low per GB storage price and per GB retrieval charge.

What is the reduced redundancy option in Amazon S3?

Reduced Redundancy Storage (RRS) is a new storage option within Amazon S3 that enables customers to reduce their costs by storing non-critical, reproducible data at lower levels of redundancy than Amazon S3's standard storage.

How many days must an object remain in standard infrequent access storage class before transitioning to Glacier storage class?

Before you transition objects from the S3 Standard or S3 Standard-IA storage classes to S3 Standard-IA or S3 One Zone-IA, you must store them at least 30 days in the S3 Standard storage class.

Which S3 storage class takes the most time to retrieve data?

Which of the following S3 storage classes takes the most time to retrieve data (also known as first byte latency)? "S3 Glacier Deep Archive" - S3 Glacier Deep Archive is Amazon S3's lowest-cost storage class and supports long-term retention and digital preservation for data that may be accessed once or twice in a year.


2 Answers

There's a problem with the premise of your question. The fact that CloudFront may cache your objects for some period of time actually has little relevance when selecting an S3 storage class.

REDUCED_REDUNDANCY is sometimes less expensive¹ because S3 stores your data on fewer physical devices, reducing the reliability somewhat in exchange for lower pricing... and in the event of failures, the object is statistically more likely to be lost by S3. If S3 loses the object because of the reduced redundancy, CloudFront will at some point begin returning errors.

The deciding factor in choosing this storage class is whether the object is easily replaced.

Reduced Redundancy Storage (RRS) is an Amazon S3 storage option that enables customers to reduce their costs by storing noncritical, reproducible data at lower levels of redundancy than Amazon S3’s standard storage. It provides a cost-effective, highly available solution for distributing or sharing content that is durably stored elsewhere, or for storing thumbnails, transcoded media, or other processed data that can be easily reproduced.

https://aws.amazon.com/s3/reduced-redundancy/

STANDARD_IA (infrequent access) is less expensive for a different reason: the storage savings are offset by retrieval charges. If an object is downloaded more than once per month, the combined charge will exceed the cost of STANDARD. It is intended for objects that will genuinely be accessed infrequently. Since CloudFront has multiple edge locations, each with its own independent cache,² whether an object is "currently stored in" CloudFront is not a question with a simple yes/no answer. It is also not possible to "game the system" by specifying large Cache-Control: max-age values. CloudFront has no charge for its cache storage, so it's only sensible that an object can be purged from the cache before the expiration time you specify. Indeed, anecdotal observations confirm what the docs indicate, that objects are sometimes purged from CloudFront due to a relative lack of "popularity."

The deciding factor in choosing this storage class is whether the increased data transfer (retrieval) charges will be low enough to justify the storage charge savings that they offset. Unless the object is expected to be downloaded less than once or twice a month, this storage class does not represent a cost savings.

Standard/Infrequent Access should be reserved for things you really don't expect to be needed often, like tarballs and database dumps and images unlikely to be reviewed after they are first accessed, such as (borrowing an example from my world) a proof-of-purchase/receipt scanned and submitted by a customer for a rebate claim. Once the rebate has been approved, it's very unlikely we'll need to look at that receipt again, but we do need to keep it on file. Hello, Standard_IA. (Note that S3 does this automatically for me, after the file has been stored for 30 days, using a lifecycle policy on the bucket).

Standard - IA is ideally suited for long-term file storage, older data from sync and share, backup data, and disaster recovery files.

https://aws.amazon.com/s3/faqs/#sia

Side note: one alternative mechanism for saving some storage cost is to gzip -9 the content before storing, and set Content-Encoding: gzip. I have been doing this for years with S3 and am still waiting for my first support ticket to come in reporting a browser that can't handle it. Even content that is allegedly already compressed -- such as .xlsx spreadsheets -- will often shrink a little bit, and every byte you squeeze out means slightly lower storage and download bandwidth charges.

Fundamentally, if your content is easily replaceable, such as resized images where you still have the original... or reports that can easily be rerun from source data... or content backed up elsewhere (AWS is essentially always my first choice for cloud services, but I do have backups of my S3 assets stored in another cloud provider's storage service, for example)... then reduced redundancy is a good option.


¹ REDUCED_REDUNDANCY is sometimes less expensive only in some regions as of late 2016. Prior to that, it was priced lower than STANDARD, but in an odd quirk of the strange world of competitive pricing, as a result of S3 price reductions announced in November, 2016, in some AWS regions, the STANDARD storage class is now slightly less expensive than REDUCED_REDUNDANCY ("RRS"). For example, in us-east-1, Standard was reduced from $0.03/GB to $0.023/GB, but RRS remained at $0.024/GB... leaving no obvious reason to ever use RRS in that region. The structure of the pricing pages leaves the impression that RRS may no longer be considered a current-generation offering by AWS. Indeed, it's an older offering than both STANDARD_IA and GLACIER. It is unlikely to ever be fully deprecated or eliminated, but they may not be inclined to reduce its costs to a point that lines up with the other storage classes if it's no longer among their primary offerings.

² "CloudFront has multiple edge locations, each with its own independent cache" is still a technically true statement, but CloudFront quietly began to roll out and then announced some significant architectural changes in late 2016, with the introduction of the regional edge caches. It is now, in a sense, "less true" that the global edge caches are indepenent. They still are, but it makes less of a difference, since CloudFront is now a two-tier network, with the global (outer tier) edge nodes sometimes fetching content from the regional (inner tier) edge nodes, instead of directly from the origin server. This should have the impact of increasing the likelihood of an object being considered to be "in" the cache, since a cache miss in the outer tier might be transformed into a hit by the inner tier, which is also reported to have more available cache storage space than some or all of the outer tier. It is not yet clear from external observations how much of an impact this has on hit rates on S3 origins, as the documentation indicates the regional edges are not used for S3 (only custom origins) but it seems less than clear that this universally holds true, particularly with the introduction of Lambda@Edge. It might be significant, but as of this writing, I do not believe it to have any material impact on my answer to the question presented here.

like image 57
Michael - sqlbot Avatar answered Nov 16 '22 02:11

Michael - sqlbot


Since CloudFront caches originals and accesses them only every 24 hours

You can actually make CloudFront cache things for much longer if you want. You just need to add metadata to your objects that sets a Cache Control header, and according to the S3 documentation you can specify an age up to 100 years. You simply set a max-age in seconds, so if you really want to have your objects cached for 100 years:

Cache-Control: max-age=3153600000

As for your main question regarding SIA vs. RR, you've pretty much hit on all the differences between the two. It's just a matter of calculating the costs of using one vs. the other. You'll just need to run some calculations and see what the cost estimates are. If you have 100 thumbnails all under 128K then SIA will charge you for 100 * 128K bytes, whereas RR will just charge you for the costs of the total size of those 100 thumbnails. Similarly, if you set a fairly high cache timeout in CloudFront then you may see only 10 fetches from S3 each day, so SIA would charge you for retrieval of 10 * 128K bytes each day while RR would only charge you for the cost of the size of those 10 thumbnails.

Using some real numbers based on the size & quantity of your thumbnails and the amount of traffic you anticipate it should be pretty easy to come up with cost estimates.

FYI, you might also want to take a look at some of these slideshows and/or these videos. These are all from Amazon's re:Invent conferences, and these links should provide you with S3-specific presentations at those conferences.

like image 21
Bruce P Avatar answered Nov 16 '22 01:11

Bruce P