Amazon EC2 On-Demand Workers for Short Tasks

Question

I am looking to build a web application which needs to run resource-intensive MCMC (Markov chain Monte Carlo) calculations on-demand in R to generate some probability graphs for the user.

Constraints:

Obviously I don't want to run the resource-intensive calculations on the same server as the web app front-end, so these tasks need to be handed off to a worker instance.
These calculations take a good amount of CPU to run and I'd like to keep latency as low as possible (hopefully seconds, not minutes), so I would prefer to run the calculations on beefier hardware.
I cannot afford to run a beefy EC2 instance at ~66¢/hr x 24hrs/day, so on-demand or spot request instances are probably necessary.

Here are the options I've come up with:

Run a cheap, affordable worker instance 24hrs a day which takes one task at a time managed by Amazon SWF (or SQS).

Cons:
- high latency - Cheaper hardware, longer wait times.
Spawn a beefier worker instance per-task (spun up whenever a job is added to the queue) and terminate the instance upon completion.

Cons:
- expensive/wasteful - I'd be paying for an hour on the server each time and only using seconds for my calculation
- startup overhead - Would spinning up a new EC2 instance on-demand introduce non-negligible latency (offsetting the whole purpose of utilizing beefier hardware)?
Like #2 but with low-bid EC2 spot requests.

Cons:
- startup overhead - See #2
- inconsistancy? - I've never worked with spot requests before, so I have no idea how volatile or hands-on such a solution would be... do I have to continually adjust my bids to make sure I can still get tasks done at peak hours? Also, I suppose I'd have to monitor my processes closely to make sure they aren't interrupted mid-calculation.
Some kind of hybrid solution where I actively monitor beefy-hardware worker instances and their loads and intelligently spin up and terminate instances on the hour to maintain an optimal balance of cost and availability

Cons:
- complicated and costly setup - Unless there's a good managed service out there to handle stuff like this, I'd have to set all all of that infrastructure up myself...

I wish there was some service where I could pay for a highly-available on-demand hardware on a minute to minute basis rather than hourly.

So my questions are the following:

How would you recommend solving this problem?
Is there a good EC2 instance managing solution that could sit on top of Amazon SWF and help me load balance and terminate idle workers?
Would spot-request bids solve my problem or are they more suited to tasks which don't necessarily need to be completed right away?

jman · Accepted Answer

There's another option that you may not be aware of. I actually just stumbled upon it: http://multyvac.com

I have no experience using it (so I can't vouch for it), but it looks like the first solution I've seen that actually offers true "utility computing". It began with just Python but now supports any language.

Kobi · Answer

I wish there was some service where I could pay for a highly-available on-demand hardware on a minute to minute basis rather than hourly.

That service is AWS Lambda, which wasn't available when you asked the question:

Lambda runs your code on high-availability compute infrastructure and performs all the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling

Pricing:

You are charged based on the number of requests for your functions and the time your code executes

Duration is calculated from the time your code begins executing until it returns or otherwise terminates, rounded up to the nearest 100ms.

The Lambda free tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month.

You can also wrap a Lambda function with an HTTP endpoint, possibly removing this layer from your application:

You can invoke a Lambda function over HTTPS by defining a custom RESTful API using Amazon API Gateway. This gives you an endpoint for your function which can respond to REST calls like GET, PUT and POST. Read more about using AWS Lambda with Amazon API Gateway.

Caveat: Lambda currently supports only JavaScript, Java, and Python, so I'm not sure how you would get R to work. You may need to host R in one of these runtimes.

Amazon EC2 On-Demand Workers for Short Tasks

Tags:

r

amazon-ec2

amazon-swf

amazon-emr

mikegreiling

2 Answers

jman

Kobi

Recent Activity

Donate For Us

Amazon EC2 On-Demand Workers for Short Tasks

Tags:

r

amazon-ec2

amazon-swf

amazon-emr

mikegreiling

2 Answers

jman

Kobi

Related questions

Recent Activity

Donate For Us