I'm working on an automated mechanism for our EBS volumes to be backed up on a daily basis.
I know quite well the steps to create a new snapshot. Apparently it's all quite simple, you have an EBS volume which you can snapshot, and you can restore the snapshot anytime. Fine.
But my concern is about the size of the snapshots, I know these snapshots are stored with compression in S3, and we're going to be charged depending on the size of the snapshots. If we have large amounts of data we'll have a significant amount increase in the invoice for each backup we make.
However, according to Amazon's pages, these snapshots are incremental. That'd solve my problem as the daily backup would only upload the data which has changed since the last snapshot. But this leads me to next question: if the backup is incremental and we're only uploading the modified data, where's the original data being stored? (ie. the first snapshot which obviously couldn't have been done incrementally...)
Unfortunately I haven't been able to find this information all over Amazon's documents.
Does anybody have experience with snapshots and its billing?
I'd appreciate any help, thanks!
Amazon EBS Snapshots are incremental, storing only the changes since the last snapshot, making them cost effective and ideal for frequent backups. You can use tools such as AWS Cost Explorer to track snapshot usage and spend, and further optimize storage costs as needed.
Snapshot pricing Charges for your snapshots are based on the amount of data stored. Because snapshots are incremental, deleting a snapshot might not reduce your data storage costs.
AWS Snapshots vs Backups: Not the Same An AWS snapshot is just a point-in-time copy of an Amazon EBS volume with limited storage and recovery options. A backup is a more comprehensive and flexible copy of your VMs that offers reliable protection and ensures fast and consistent recovery.
Incremental backupsThe first backup of an AWS resource backs up a full copy of your data. For each successive incremental backup, only the changes to your AWS resources are backed up. Incremental backups enable you to benefit from the data protection of frequent backups while minimizing storage costs.
I don't think that you'll find detailed documentation as to how the snapshots are implemented; it's not something I have come across. They do have documentation for "Projecting Costs". However, I think if you know how it works, you can intuit the bill, and feel more at ease with it.
Note that these snapshots are not "incremental" in the way we may have come to understand that term in the DOS operating system. In DOS, the "archive" bit was set when a file was modified, and an "incremental" backup copied only the files that had it's "archive" bit set. The backup process would clear the archive attribute, so a future edit to the file would cause it to be backed up "incrementally" once again.
With snapshots, each block of the volume is flagged if it is modified. It's not done on a file by file basis. After the first snapshot, only blocks that have been flagged as modified are backed up, just like "incremental" backups in DOS. But that's where the similarities end, because with each block that it doesn't have to copy it doesn't just skip it, it writes a pointer to where the last (unchanged) copy of the data is.
The first snapshot you make of a volume, the data is broken up into blocks. From Amazon: "Volume data is broken up into chunks before being transferred to Amazon S3. While the size of the chunks could change through future optimizations, the number [...] can be estimated by dividing the size of the data that has changed since the last snapshot by 4MB."
The next snapshot you make consists of data for only those blocks that have changed, and pointers to the blocks that haven't changed. Those pointers point to blocks of data in the previous snapshot.
The next snapshot (n) is made by recording data of each block changed since the previous snapshot (n-1), along with pointers for the blocks that haven't changed since the previous snapshot (n-1). These pointers point to corresponding blocks in the previous snapshot, which may contain data, or another pointer to its previous snapshot. Eventually, every pointer ends up at a block of real data, (that hasn't changed since that snapshot was created).
Now let's say you decide to delete snapshot (x). Snapshot (x) has snapshots made before it (x-1), and after it (x+1). Amazon replaces the pointers in snapshot (x+1) with pointers and data from snapshot (x) (the one being deleted). As a result, any actual data in snapshot (x) is copied to snapshot (x+1), unless it has it's own copy of more recent data for that block there.
This is how snapshots work, where the data is stored, and why the size of the snapshots are manageable. You can understand from this how deleting a snapshot will destroy only your ability to bring back the volume as it was at the point in time when that snapshot was created, without destroying the ability to use your other snapshots. Unlike simple, traditional "incremental" backups that don't utilize pointers, snapshots not being deleted are updated as needed to maintain their usefulness when one of its dependent snapshots are deleted. This is why it makes sense that Amazon charges more for intelligent snapshot storage than simple copies of EBS volumes. Finally, it's understandable that it's difficult to predict how much snapshot storage is going to cost, since it is so dynamic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With