Amazon EBS, snapshots as incremental backups

Tags:

I'm working on an automated mechanism for our EBS volumes to be backed up on a daily basis.

I know quite well the steps to create a new snapshot. Apparently it's all quite simple, you have an EBS volume which you can snapshot, and you can restore the snapshot anytime. Fine.

But my concern is about the size of the snapshots, I know these snapshots are stored with compression in S3, and we're going to be charged depending on the size of the snapshots. If we have large amounts of data we'll have a significant amount increase in the invoice for each backup we make.

However, according to Amazon's pages, these snapshots are incremental. That'd solve my problem as the daily backup would only upload the data which has changed since the last snapshot. But this leads me to next question: if the backup is incremental and we're only uploading the modified data, where's the original data being stored? (ie. the first snapshot which obviously couldn't have been done incrementally...)

Unfortunately I haven't been able to find this information all over Amazon's documents.

Does anybody have experience with snapshots and its billing?

I'd appreciate any help, thanks!

260

asked Jun 24 '11 14:06

xuuso

1 Answers

I don't think that you'll find detailed documentation as to how the snapshots are implemented; it's not something I have come across. They do have documentation for "Projecting Costs". However, I think if you know how it works, you can intuit the bill, and feel more at ease with it.

Note that these snapshots are not "incremental" in the way we may have come to understand that term in the DOS operating system. In DOS, the "archive" bit was set when a file was modified, and an "incremental" backup copied only the files that had it's "archive" bit set. The backup process would clear the archive attribute, so a future edit to the file would cause it to be backed up "incrementally" once again.

With snapshots, each block of the volume is flagged if it is modified. It's not done on a file by file basis. After the first snapshot, only blocks that have been flagged as modified are backed up, just like "incremental" backups in DOS. But that's where the similarities end, because with each block that it doesn't have to copy it doesn't just skip it, it writes a pointer to where the last (unchanged) copy of the data is.

The first snapshot you make of a volume, the data is broken up into blocks. From Amazon: "Volume data is broken up into chunks before being transferred to Amazon S3. While the size of the chunks could change through future optimizations, the number [...] can be estimated by dividing the size of the data that has changed since the last snapshot by 4MB."

The next snapshot you make consists of data for only those blocks that have changed, and pointers to the blocks that haven't changed. Those pointers point to blocks of data in the previous snapshot.

The next snapshot (n) is made by recording data of each block changed since the previous snapshot (n-1), along with pointers for the blocks that haven't changed since the previous snapshot (n-1). These pointers point to corresponding blocks in the previous snapshot, which may contain data, or another pointer to its previous snapshot. Eventually, every pointer ends up at a block of real data, (that hasn't changed since that snapshot was created).

Now let's say you decide to delete snapshot (x). Snapshot (x) has snapshots made before it (x-1), and after it (x+1). Amazon replaces the pointers in snapshot (x+1) with pointers and data from snapshot (x) (the one being deleted). As a result, any actual data in snapshot (x) is copied to snapshot (x+1), unless it has it's own copy of more recent data for that block there.

This is how snapshots work, where the data is stored, and why the size of the snapshots are manageable. You can understand from this how deleting a snapshot will destroy only your ability to bring back the volume as it was at the point in time when that snapshot was created, without destroying the ability to use your other snapshots. Unlike simple, traditional "incremental" backups that don't utilize pointers, snapshots not being deleted are updated as needed to maintain their usefulness when one of its dependent snapshots are deleted. This is why it makes sense that Amazon charges more for intelligent snapshot storage than simple copies of EBS volumes. Finally, it's understandable that it's difficult to predict how much snapshot storage is going to cost, since it is so dynamic.

answered Sep 23 '22 04:09

John Langstaff

Related questions
                            
                                Unable to trigger mouse event in Capybara test
                            
                                generate_series() equivalent in MySQL
                            
                                How to make javac find JAR files? (Eclipse can see them)
                            
                                Using named parameters with PDO for LIKE
                            
                                How to access Gnuplot's (auto)range values and modify them to add some margin?
                            
                                Proper way to use versions and milestones
                            
                                How to list local commits difference in git
                            
                                Idiomatic use of parentheses in Ruby
                            
                                add method to loaded class/module in python
                            
                                Array intersection in Bash [duplicate]
                            
                                Pinging an IP address using PHP and echoing the result
                            
                                How to generate MD5 hash code for my WinRT app using C#?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With