Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run EMR Cluster Steps concurrently?

Hi I have an EMR cluster. Whenever I submit "steps" to it, it runs them sequentially. Is there any way to run "steps" concurrently?

Or is the appropriate use case to spin up multiple clusters at the same time if you want concurrency?

like image 333
Sean Bollin Avatar asked Oct 15 '14 18:10

Sean Bollin


1 Answers

Each step is processed concurrently across the cluster. So really if you have work that can be done concurrently, you might consider having it all in the same step (each step can have 1 or more Hadoop jobs).

Typically you might use steps when you want to make sure that ALL processing that needs to be done for the following step is completed before moving to the next step. A good example of this might be when you are dealing with encrypted data, where to might have one step to decrypt the data, one step to process the data, and an additional step to re-encrypt the data before persistence.

like image 92
Mike Brant Avatar answered Sep 27 '22 22:09

Mike Brant