This is more of a beginner's question. Say I have the following code:
library("multicore")
library("iterators")
library("foreach")
library("doMC")
registerDoMC(16)
foreach(i in 1:M) %dopar% {
##do stuff
}
This code then will run on 16 cores, if they are available. Now if I understand correctly, using Amazon EC2, on one instance, I get depending on the instance only few cores. So if I want to run simulations on 16 cores, I need to use several instances, which means as I far as I understand launching new R processes. But then I need to write additional code outside of R to gather the results.
So my question is, is there an R package, which lets to launch EC2 instances from within R, automagicaly distributes the load between these instances, and gathers the results in the initial R launched?
Q: How many instances can I run in Amazon EC2? You are limited to running On-Demand Instances per your vCPU-based On-Demand Instance limit, purchasing 20 Reserved Instances, and requesting Spot Instances per your dynamic Spot limit per region.
In a single API call, a fleet can launch multiple instance types across multiple Availability Zones, using the On-Demand Instance, Reserved Instance, and Spot Instance purchasing options together.
To be precise, the maximum instance type on EC2 is currently 8 cores, so anyone, even users of R, would need multiple instances in order to have run concurrently on more than 8 cores.
If you want to use more instances, then you have two options for deploying R: "regular" R invocations or MapReduce invocations. In the former case, you will have to set up code to launch instances, distribute tasks (e.g. the independent iterations in foreach
), return results, etc. This is doable, but you're not likely to enjoy it. In this case, you can use something like rmr
or RHipe
to manage a MapReduce grid, or you can use snow
and many other HPC tools to create a simple grid. Use of snow
may make it easier to keep your code intact, but you will have to learn how to tie this stuff together.
In the latter case, you can build upon infrastructure that Amazon has provided, such as Elastic MapReduce (EMR) and packages that make that simpler, such as JD's segue
. I'd recommend segue
as a good starting point, as others have done, as it has a gentler learning curve. The developer is also on SO, so you can easily embarrass query him when it breaks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With