I am looking to build a web application which needs to run resource-intensive MCMC (Markov chain Monte Carlo) calculations on-demand in R to generate some probability graphs for the user.
Constraints:
Obviously I don't want to run the resource-intensive calculations on the same server as the web app front-end, so these tasks need to be handed off to a worker instance.
These calculations take a good amount of CPU to run and I'd like to keep latency as low as possible (hopefully seconds, not minutes), so I would prefer to run the calculations on beefier hardware.
I cannot afford to run a beefy EC2 instance at ~66¢/hr x 24hrs/day, so on-demand or spot request instances are probably necessary.
Here are the options I've come up with:
Run a cheap, affordable worker instance 24hrs a day which takes
one task at a time managed by Amazon SWF (or SQS).
Cons:
Spawn a beefier worker instance per-task (spun up whenever a job
is added to the queue) and terminate the instance upon completion.
Cons:
Like #2 but with low-bid EC2 spot requests.
Cons:
Some kind of hybrid solution where I actively monitor
beefy-hardware worker instances and their loads and intelligently
spin up and terminate instances on the hour to maintain an optimal
balance of cost and availability
Cons:
I wish there was some service where I could pay for a highly-available on-demand hardware on a minute to minute basis rather than hourly.
So my questions are the following:
How would you recommend solving this problem?
Is there a good EC2 instance managing solution that could sit on top of Amazon SWF and help me load balance and terminate idle workers?
Would spot-request bids solve my problem or are they more suited to tasks which don't necessarily need to be completed right away?
There's another option that you may not be aware of. I actually just stumbled upon it: http://multyvac.com
I have no experience using it (so I can't vouch for it), but it looks like the first solution I've seen that actually offers true "utility computing". It began with just Python but now supports any language.
I wish there was some service where I could pay for a highly-available on-demand hardware on a minute to minute basis rather than hourly.
That service is AWS Lambda, which wasn't available when you asked the question:
Lambda runs your code on high-availability compute infrastructure and performs all the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling
Pricing:
You are charged based on the number of requests for your functions and the time your code executes
Duration is calculated from the time your code begins executing until it returns or otherwise terminates, rounded up to the nearest 100ms.
The Lambda free tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month.
You can also wrap a Lambda function with an HTTP endpoint, possibly removing this layer from your application:
You can invoke a Lambda function over HTTPS by defining a custom RESTful API using Amazon API Gateway. This gives you an endpoint for your function which can respond to REST calls like GET, PUT and POST. Read more about using AWS Lambda with Amazon API Gateway.
Caveat: Lambda currently supports only JavaScript, Java, and Python, so I'm not sure how you would get R to work. You may need to host R in one of these runtimes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With