Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Easiest way to spin up an Amazon EC2 cluster for use as a foreach backend

I want to start a cluster of amazon EC2 machine for use a backend for the foreach package in R. Ideally, I could do this all from the command line in R on my local machine, sending the relevant data and commands from the local R session to the remote cluster.

I know the AWS package will help with this task, but I don't really know what to do after running the startCluster() command. Segue also gets me part of the way there, but it's not a backend for foreach, it doesn't seem to support custom AMIs, and it doesn't currently support windows. There is also the deathstar package, which I haven't explored in-depth.

Has anyone else come up with a solution to this problem?

like image 585
Zach Avatar asked Nov 17 '11 20:11

Zach


People also ask

What can you use in order to have a common data source for multiple EC2 instances?

You can use an EFS file system as a common data source for workloads and applications running on multiple instances. For more information, see Use Amazon EFS with Amazon EC2.

What is cluster auto scaling?

Cluster autoscaler scales down only the nodes that can be safely removed. Scaling up is disabled. The node pool does not scale above the value you specified. Note that cluster autoscaler never automatically scales to zero nodes: One or more nodes must always be available in the cluster to run system Pods.


1 Answers

Zach, the simple answer is that there's not a simple path to there from here :)

When I wrote Segue I hoped that someone would soon come out with something that would make Segue obsolete. Cloudnumbers may be it one day, but probably not yet. I have toyed with making Segue a foreach backend, but since I don't use it that way, my motivation has been pretty low to take the time to learn how to build the backend.

One of the things that is very promising, in my opinion, is using the doRedis() package with workers on Amazon EC2. doRedis uses a Redis server as the job controller and then lets workers connect to the Redis server and get/return jobs and results. I've been thinking for a while that it would be nice to have a dead simple way to deploy a doRedis cluster on EC2. But nobody has written one yet that I know of.

like image 199
JD Long Avatar answered Oct 21 '22 11:10

JD Long