Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS - HA NFS - Best practices

Anyone have a sound strategy for implementing NFS on AWS in such a way that it's not a SPoF (single point of failure), or at the very least, be able to recover quickly if an instance crashes?

I've read this SO post, relating to the ability to share files with multiple EC2 instances, but it doesn't answer the question of how to ensure HA with NFS on AWS, just that NFS can be used.

A lot of online assets are saying that AWS EFS is available, but it is still in preview mode and only available in the Oregon region, our primary VPC is located in N. Cali., so can't use this option.

Other online assets are saying that GlusterFS is a way to go, but after some research I just don't feel comfortable implementing this solution due to race conditions and performance concerns.

Another options is SoftNAS but I want to avoid bringing in an unknown AMI into a tightly controlled, homogeneous environment.

Which leaves NFS. NFS is what we use in our dev environment and works fine, but it's dev, so if it crashes we go get a couple beers while systems fixes the problem, but on production, this is obviously a no go.

The best solution I can come up with at this point is to create an EBS and two EC2 instances. Both instances will be updated as normal (via puppet) to maintain stack alignment (kernel, nfs libs etc), but only one instance will mount the EBS. We set up a monitor on the active NFS instance, and if it goes down, we are notified and we manually detach and attach to the backup EC2 instance. I'm thinking we also create a network interface that can also be de/re-attached so we only need to maintain a single IP in DNS.

Although I suppose we could do this automatically with keepalived, and a IAM policy that will allow the automatic detachment/re-attachment.

--UPDATE--

It looks like EBS volumes are tied to specific availability zones, so re-attaching to an instance in another AZ is impossible. The only other option I can think of is:

  1. Create EC2 in each AZ, in public subnet (each have EIP)
  2. Create route 53 healthcheck for TCP:2049
  3. Create route 53 failover policies for nfs-1 (AZ1) and nfs-2 (AZ2)

The only question here is, what's the best way to keep the two NFS servers in-sync? Just cron an rsync script between them?

Or is there a best practice that I am completely missing?

like image 869
Mike Purcell Avatar asked Oct 18 '22 23:10

Mike Purcell


1 Answers

There are a few options to build a highly available NFS server. Though I prefer using EFS or GlusterFS because all these solutions have their downsides.

a) DRBD It is possible to synchronize volumes with the help of DRBD. This allows you to mirror your data. Use two EC2 instances in different availability zones for high availability. Downside: configuration and operation is complex.

b) EBS Snapshots If a RPO of more than 30 minutes is reasonable you can use periodic EBS snapshots to be able to recover from an outage in another availability zone. This can be achieved with an Auto Scaling Group running a single EC2 instance, a user-data script and a cronjob for periodic EBS snapshots. Downside: RPO > 30 min.

c) S3 Synchronisation It is possible to synchronize the state of an EC2 instance acting as NFS server to S3. The standby server uses S3 to stay up to date. Downside: S3 sync of lots of small files will take too long.

I recommend watching this talk from AWS re:Invent: https://youtu.be/xbuiIwEOCAs

like image 60
Andreas Avatar answered Nov 15 '22 07:11

Andreas