Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tricks to make an AWS spot instance "persistent"?

My client uses AWS for his VPSes. One thing he is having a problem with is that if the bids for the spot instance go above his bids, then his instances are terminated. Not such a big deal, it would seem, except that spot instances aren't persistent, so we have to restore from an image every time this happens.

What he is wanting me to do is write something that will check for terminated instances every X amount of time, and restart them automatically. More importantly, he wants some sort of way to feign "persistence". The best idea I have is to simply create an image from each server every Y amount of time and then boot from that image (if/when that instance is terminated).

Any other ideas would be nice to hear. I guess my question is, am I on the right track here, and do you guys know of any solutions for this that may already exist?

UPDATE: Almost a year later, I come back here to find all these wonderful responses and much more attention to the topic than I'd ever anticipated. A lot of the below answers, while informative and helpful, question my reasoning. I want to state that, even at that time, I agreed 100% that this was not a wise idea, but was what my client demanded, despite any attempt on my part, to turn things in a better direction.

Thank you all very much for your help. I did end up figuring out how to do exactly what I wanted, and was able to write some code that automatically relaunches terminated instances. It was never easy, but it worked well by the time I moved on to a new client.

Good luck to any of you with the same problem, you're undertaking (possibly by force, as was my case) something that won't be easy. Spot requests are cheaper, as some folks here alluded in their responses, specifically because persistence is not offered. Otherwise, I imagine the "spot request" market would be priced much differently.

All the same, it is possible, I did it, and it was a great experience. When there isn't a way, you have to forge it! If you don't, someone else will.

UPDATE II: I just want to remind everybody that this is something I was essentially tasked with. While many people just dismissed the entire concept at the time, I ended up with an more-or-less functional SaaS allowing one to easily manage and monitor all of ones' spot instances, including the ability to enable/disable auto-persistent relaunch per instance, schedule times for individual instances (that they should or should not ever be started,) etc.

While I absolutely agree that, from a developer's point of view, it is an inelegant demand, and at the time, I did not want to do it, I'd still say that it was kind of nice in a way, being demanded to work on it, because not only did I learn a lot, not only did I gain a lot of confidence in my ability and my code, but I produced a really useful and, as far as I know, very valuable piece of software for my client (even if they were asking for the wrong things because they didn't know better).

I tried to talk him out of it, but he insisted, and since he was the one paying, I focused my attention there and not only accomplished what many here dismissed as silly but made it profitable for someone.

If it were that silly, it wouldn't have saved anyone money.

Look, I read this post now and cringe a little. I was a lot more naive, then. I know AWS a lot better, now, I code a lot better now, etc. Naturally.

But I am still proud of solving this one, especially since it was these fellow, older, and much more experienced, undoubtedly great programmers who were the ones telling me it couldn't or shouldn't be done. You were the ones who made it a challenge to me, so thank you!

What if it can be done profitably? Are you sure that it shouldn't?

like image 773
Ethan Barron Avatar asked Oct 24 '13 19:10

Ethan Barron


People also ask

What is the recommended strategy for spot instances?

We recommend using the capacity optimized strategy because this strategy automatically provisions instances from the most-available Spot Instance pools. Because your Spot Instance capacity is sourced from pools with optimal capacity, this decreases the possibility that your Spot Instances are interrupted.

How long do AWS spot instances last?

Defined duration—you can get a spot instance guaranteed to run for a period of 1-6 hours. The longer the defined duration, the lower the discount provided for the spot instance.

How long can you use a Spot instance for?

You can now request Amazon EC2 Spot instances to run continuously, for up to six hours, at a flat rate that saves you up to 50% compared to On-Demand prices.

How often do spot instances get interrupted?

At the time of writing this blog post, less than 5% of Spot Instances are interrupted by EC2 before being terminated intentionally by a customer, because they are automatically handled through integrations with AWS services.


1 Answers

We ended up finding a solution, and here is what we had to do. I'm going to list this out step-by-step, to make recreating this easier for those who may be looking for a similar type of solution...

  1. Create a new spot request instance. Make sure to uncheck "Delete on Termination" for the root device, so that the volume stays behind in the next step. Make sure to note the architecture (we always use x86_64) and the kernel ID that your instance is using (very important!)
  2. Now, SSH into your new instance and make a file or something, so you can see the effect of persistence first-hand. After making some changes to the filesystem, go ahead and logout of the SSH connection and terminate the instance.
  3. Awesome. Now, go to your EC2 web console and find the new volume that was being used for the instance we just terminated. Right click the volume and select "Create Image". Follow the wizard, making certain to select the same architecture and kernel ID that we noted earlier.
  4. Now, start the spot request wizard using your new image. Follow the wizard, again making certain to uncheck "Delete on Termination". Additionally, and this is the easy step to miss, make sure to expand the collapsed section titled 'Advanced Options' and set the correct kernel ID again.

If you follow the above steps to the T, you will have a new instance at the same point that your old instance was at when it was terminated. Therefore, we have achieved some form of persistence.

like image 116
Ethan Barron Avatar answered Sep 20 '22 21:09

Ethan Barron