We're switching from storing all user uploaded files on our servers, to using Amazon S3. It's approx. 300 GB of files.
What is the best way to keep an backup of all files? I've seen a few different suggestions:
Pros/cons? Best practice?
Unilever uses Amazon S3 and Amazon EBS snapshots to protect backup data in the cloud. Hess uses Amazon EBS snapshots to create nightly backups, and then stores database backups in Amazon S3 and the S3 Glacier storage class for long-term archiving.
Data protection refers to protecting data while in-transit (as it travels to and from Amazon S3) and at rest (while it is stored on disks in Amazon S3 data centers). You can protect data in transit using Secure Socket Layer/Transport Layer Security (SSL/TLS) or client-side encryption.
What is the best way to keep an backup of all files?
In theory, you don't need to. S3 has never lost a single bit in all these years. Your data is already stored in multiple data centers.
If you're really worried about accidentally deleting the files, use IAM keys. For each IAM user, disable the delete operation. And/or turn on versioning and remove the ability for an IAM user to do the real deletes.
If you still want a backup, EBS or S3 is pretty trivial to implement: Just run an S3 Sync utility to sync between buckets or to the EBS drive. (There are a lot of them, and it's trivial to write.) Note that you pay for unused space on your EBS drive, so it's probably more expensive if you're growing. I wouldn't use EBS unless you really had a use for local access to the files.
The upside of the S3 bucket sync is you can quickly switch your app to using the other bucket.
You could also use Glacier to backup your files, but that has some severe limitations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With