Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to launch multiple EC2 instances within R?

Tags:

r

amazon-ec2

This is more of a beginner's question. Say I have the following code:

library("multicore")
library("iterators")
library("foreach")
library("doMC")

registerDoMC(16)

foreach(i in 1:M) %dopar% {
   ##do stuff
}

This code then will run on 16 cores, if they are available. Now if I understand correctly, using Amazon EC2, on one instance, I get depending on the instance only few cores. So if I want to run simulations on 16 cores, I need to use several instances, which means as I far as I understand launching new R processes. But then I need to write additional code outside of R to gather the results.

So my question is, is there an R package, which lets to launch EC2 instances from within R, automagicaly distributes the load between these instances, and gathers the results in the initial R launched?

like image 969
mpiktas Avatar asked Nov 09 '11 10:11

mpiktas


People also ask

How many EC2 instances can be launched at the same time?

Q: How many instances can I run in Amazon EC2? You are limited to running On-Demand Instances per your vCPU-based On-Demand Instance limit, purchasing 20 Reserved Instances, and requesting Spot Instances per your dynamic Spot limit per region.

Can we launch multiple instances in AWS?

In a single API call, a fleet can launch multiple instance types across multiple Availability Zones, using the On-Demand Instance, Reserved Instance, and Spot Instance purchasing options together.


1 Answers

To be precise, the maximum instance type on EC2 is currently 8 cores, so anyone, even users of R, would need multiple instances in order to have run concurrently on more than 8 cores.

If you want to use more instances, then you have two options for deploying R: "regular" R invocations or MapReduce invocations. In the former case, you will have to set up code to launch instances, distribute tasks (e.g. the independent iterations in foreach), return results, etc. This is doable, but you're not likely to enjoy it. In this case, you can use something like rmr or RHipe to manage a MapReduce grid, or you can use snow and many other HPC tools to create a simple grid. Use of snow may make it easier to keep your code intact, but you will have to learn how to tie this stuff together.

In the latter case, you can build upon infrastructure that Amazon has provided, such as Elastic MapReduce (EMR) and packages that make that simpler, such as JD's segue. I'd recommend segue as a good starting point, as others have done, as it has a gentler learning curve. The developer is also on SO, so you can easily embarrass query him when it breaks.

like image 167
Iterator Avatar answered Oct 22 '22 23:10

Iterator