Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advice for data storage on Amazon EC2 especially for databases [closed]

I've been playing around with Amazon's Web Services for over a year now, however I don't quite understand how it works. When I for example select an AMI of my choice from the EC2 console and I continue through the wizard, I reach the "Storage Configuration Tab". There are several options here.

There is the root volume tab and then there is the EBS volume tab. How do both of these differ? What is the maximum size I can allocate for each? How can I configure the EBS Volumes to work with my Instance? Say for example I decide to create 8 EBS volumes each with 25 GB of storage....now for something like a Postgresql database which naturally lives on the root device, how I configure it so the database is stored across all 8 EBS volumes? In a sense, the 8 EBS volumes becoming one 200 GB drive and the postgres database data stored across that whole drive.

Any form of clarification will be appreciated.

like image 474
deadlock Avatar asked Apr 16 '13 17:04

deadlock


1 Answers

You should read the benefits of EBS vs instance store. I also wrote a bit about the PostgreSQL angle of this on my work blog recently. See also what root device to use for a new EC2 instance and the other questions listed in the Related sidebar.

Instance store will eventually EAT YOUR DATA unless you carefully set up replication and regular backups. If an instance fails or is terminated you cannot get your data back if it's on an instance store. You need good backups anyway, it's just more important with instance store and you need to be more careful about having near-real-time replication set up.

On the other hand, EBS is more likely to be affected by outages and faults that render it unavailable for a time; your data may still exist, but if you can't get to it for a couple of hours you can't fail over until the fault is fixed. So you really need good backups and replication anyway.

Quick answer, I'll leave the detailed explanations to the post:

  • The root volume is either EBS or instance store, depending on AMI type.

  • In the volumes tab you can add additional volumes. You can choose whether these are EBS or instance store volumes at volume creation time, irrespective of the AMI type. Different instance sizes have different limits on number and size of instance store volumes, but all have the same limits on EBS volumes.

  • The maximum size of an instance store volume is defined by the instance type. See the documentation for your instance. The maximum size of an EBS volume is in the first paragraph of the EBS documentation:

    Amazon EBS volumes are created in a particular Availability Zone and can be from 1 GB to 1 TB in size.

  • The PostgreSQL database doesn't "naturally live on the root volume" really. It lives where you put it. If you're using a package-manager installed version it'll usually be put in /var/lib/pgsql or /var/lib/postgres, but you can either change the startup script options to move it elsewhere, replace that with a symlink to the desired location, or mount a new volume at that point. There are ample discusions of how to move PostgreSQL on Stack Overflow, dba.stackexchange.com and serverfault so I won't repeat all that here.

  • To combine multiple EBS volumes use Linux's software RAID (md). EBS is just like any other disk as far as Linux is concerned, so see the usual documentation for setting up Linux software RAID.

Personally I've been quite unimpressed with the performance of EC2, at least with PostgreSQL. You can get a very fast database running, but only at a pretty crushing price. It's very convenient if you want to fire up some big databases for a short term job, but it isn't economic as a long lived hosting option, you're better off looking at VPS providers that offer better I/O performance. Search ServerFault, dba.stackexchange.com, etc.

Finally, a reminder: Instance store on high I/O instances seems to be faster than the other options ... but if you have to shut down or reboot your instance or the instance fails you will lose all data on your instance store volumes, so you must have good backups and real-time replication if you're going to use the instance store.

like image 178
Craig Ringer Avatar answered Oct 17 '22 00:10

Craig Ringer