Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I take a backup of aws ec2 instance/ephemeral storage?

I have my db kept at /mnt, using ephemeral storage that comes with ec2 instance. To take the backup using ec2 api tools we need a volume id, but in the aws console I can find the volume id of only the 8gb root storage.

What should I do if want the backup of ephemeral storage? Is there any alternative for backing up instance storage?

like image 504
Smita Avatar asked May 25 '12 05:05

Smita


2 Answers

Ephemeral storage, or instance storage, as-is, is like a /tmp folder, the contents of which disappear after a reboot. Of course, ephemeral drive contents aren't destroyed on a soft reboot, but they should be treated as if they were, since you can't realistically control or predict when your instance decides to die.

This has already been pointed out.

What I'd like to point out, is that if you create and configure your AMIs appropriately, you can still use the ephemeral storage to drastically improve (read) throughput, so long as you also keep EBS drives for the actual storage.

What I'm using at the moment is Linux (Ubuntu Tahr) instances with bcache. This is mainly because bcache kernel support is relatively new (IIRC, first one with bcache was 3.10), and you'd definitely want as recent a kernel as possible. Also, Tahr is the next LTS version of Ubuntu, and it's final when my project is close to launch ;)

Bcache, in its default configuration, allows you to benefit from the read speed of the ephemeral storage while giving you the persistence of EBS: It takes a fast cache device (ephemeral SSD) and uses it to speed up a slow device (EBS), writing through the cache device (that is, writing simultaneously to ephemeral cache and EBS).

This means that should an instance crash or otherwise be stopped, you can still mount the EBS volume directly without the cache, and access all your data as you would otherwise using only EBS volumes. You can also reconfigure the now wiped ephemeral devices and re-configure them as a cache to the EBS to get back to enjoying very fast reads and seeks.

My particular setup is two EBS devices, raided in stripe mode using mdadm + two ephemeral SSD devices also raided in the same manner. Then I've configured them with bcache, using the ephemeral array as the cache, and the EBS array as the "backup" device. The EBS drives can be any size, and you can always expand them (a bit tricky with EC2, because you have to create a snapshot of the current EBS volumes, and then create new larger ones based on that snapshot — you can't resize an existing EBS volume).

Of course, you'll have to create a script that runs inside your instance at startup to configure the ephemeral storage and attach it as a cache device on your EBS-backed backup device. I encourage reading up on, and experimenting with, mdadm and bcache.

For the record, testing with the Cassandra stress tool, I get better read performance with EBS volumes bcached with the ephemeral drives than I do with just striping the ephemeral drives. This is because of the algorithm used in bcache, which is very clever.

Using the ephemeral drives as a cache also reduced network traffic and is cost-effective, as it reduces I/O on EBS, and thereby your monthly bill.

Also note the different types of caching bcache provides:

  1. Write back: Use the SSD as read/write device, and only write to the backup device when pages need to be evicted from the cache. This is not useful for EC2 ephemeral setups, as it will render your backup device useless on a crash or stop.
  2. Write through: All writes go to both cache and backup. This ensures that the backup device is always as up-to-date as the cache device, and it can always be used without the cache device. Useful for EC2.
  3. Write around: All writes go directly to the backup device, and are not written to the cache device until a read request happens for that data some time in the future. Only reads are cached on the cache device. This is as safe as write through, and is useful if you know that your writes are not likely to be read in the near future. This avoids filling the cache device with data that isn't requested often, so that there's more space for what is requested data. A couple of examples could be a file upload server, a system where you write a lot of logging data, etc. If you know that your entire data set is significantly larger than the ephemeral storage size, this is most likely to be the most efficient option in a large numer of use cases.
like image 41
DanielSmedegaardBuus Avatar answered Sep 25 '22 17:09

DanielSmedegaardBuus


First and foremost, you should never store anything of lasting value on ephemeral storage in Amazon EC2, except if you know exactly what you are doing and are prepared to always have point in time backups etc. - your question seems to indicate that you might be mistaken about the concept of ephemeral storage, the respective difference between Amazon EC2 Instance Storage an Amazon EBS and the significant implications regarding data safety and backup requirements:

Ephemeral storage will be lost on stop/start cycles and can generally go away, so you definitely don't want to put anything of lasting value there, i.e. only put temporary data there you can afford to lose or rebuild easily, like a swap file or strictly temporary data in use during computations. Of course you might store huge indexes there for example, but must be prepared to rebuild these after the storage has been cleared for whatever reason (instance reboot, hardware failure, ...).

  • That's one of the many reasons Eric Hammond excellently summarized in You Should Use EBS Boot Instances on Amazon EC2), which outlines the history of and differences between the two storage concepts and assesses the few remaining possible benefits of ephemeral storage (mainly being plentiful and free).

Problem/Solution

These explanations should clarify why you are unable to backup the ephemeral storage volumes with a mechanism that solely applies to EBS volumes (i.e. EBS snapshots). Accordingly, you can backup the former via regular operating system level backup tool of your choice, with Duplicity being a popular choice optionally facilitating Amazon S3 for example, as addressed in my answer to Easiest to use backup software for live linux server.

like image 166
Steffen Opel Avatar answered Sep 21 '22 17:09

Steffen Opel