I have a postgresql database on amazon EC2 and need to determine the best way to keep this data backed up. I am considering two options:
(1) Mount an EBS volume to some directory like /pgsqldata and use this directory as the postgresql data directory (on Amazon Linux the default is /var/lib/pgsql/data/). Then this volume would get frequent snapshots.
or
(2) Keep the postgresql data directory in it's default location. Then use pg_dump to frequently dump backups to a location like /pgsqldumps and that volume will get a snapshot after each pg_dump.
A third option would be to simply snapshot the root device volume (I am using an EBS-backed instance) since it is both a webserver and database in my case. I like the idea of having a dedicated volume for data backups though.
Finally, if I am taking direct snapshots of the live postgresql data directory, do I need to worry about possible changes to the database during the snapshot process?
Thanks
Currently AWS Backup is the preferred solution. AWS Backup is a fully managed service that not only protects EBS volumes, but also offers backup capabilities for EC2 instances, Amazon RDS, Storage Gateway, DynamoDB, EFS, and Aurora.
When you create an EBS volume based on a snapshot, the new volume begins as an exact replica of the original volume that was used to create the snapshot. The replicated volume loads data in the background so that you can begin using it immediately.
I recommend that you tag your multiple volume snapshots to manage them collectively during restore, copy, or retention. Typically, multi-volume, crash-consistent snapshots are restored as a set.
Amazon Data Lifecycle Manager (DLM) policies and backup plans created in AWS Backup work independently from each other and provide two ways to manage EBS Snapshots. DLM provides a simple way to manage the lifecycle of EBS resources, such as volume snapshots.
You should move the volume to its own EBS volume anyway, this helps with write contention on the EBS volumes as well as other benefits. In addition, I have the logs writing to their own volume and back those up as well.
To answer the question, I do both. Having the EBS volume snapshotted and doing a dump of the database. This way if you want to sync your live data to a dev box (depending on the PII on the database) it is easy with a dump and restore, but you also can restore a new instance and attach a snapshot easily as well.If your database dump is less than 5gb you can sync it to S3 and forget about having to store the backups on their own volume, but if it isn't you will need to store it on its own EBS volume that is then also snapshotted on a regular basis.
Here is a script I wrote to do this, it might be outdated, but should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With