I have 2 million zipped HTML files (100-150KB) being added each day that I need to store for a long time. Hot data (70-150 million) is accessed semi regularly, anything older than that is barely ever accessed.
This means each day I'm storing an additional 200-300GB worth of files.
Now, Standard storage costs $0.023 per GB and $0.004 for Glacier.
While Glacier is cheap, the problem with it is that it has additional costs, so it would be a bad idea to dump 2 million files into Glacier:
PUT requests to Glacier $0.05 per 1,000 requests
Lifecycle Transition Requests into Glacier $0.05 per 1,000 requests
Is there a way of gluing the files together, but keeping them accessible individually?
Small Files Create Too Much Latency For Data Analytics Since streaming data comes in small files, typically you write these files to S3 rather than combine them on write. But small files impede performance. This is true regardless of whether you're working with Hadoop or Spark, in the cloud or on-premises.
Q: How much data can I store in Amazon S3? The total volume of data and number of objects you can store are unlimited. Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.
What is Amazon S3? Amazon Simple Storage Service (Amazon S3) is a scalable, high-speed, web-based cloud storage service. The service is designed for online backup and archiving of data and applications on Amazon Web Services (AWS).
Amazon S3 is a service that enables you to store your data (referred to as objects) at massive scale. In this guide, you will create an Amazon S3 bucket (a container for data stored in S3), upload a file, retrieve the file, and delete the file.
An important point, that if you need to provide quick access to these files, then Glacier can give you access to the file in up to 12 hours. So the best you can do is to use S3 Standard – Infrequent Access
(0,0125 USD per GB with millisecond access) instead of S3 Standard
. And maybe for some really not using data Glacier
. But it still depends on how fast do you need that data.
Having that I'd suggest following:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With