Hi I have an EMR cluster. Whenever I submit "steps" to it, it runs them sequentially. Is there any way to run "steps" concurrently?
Or is the appropriate use case to spin up multiple clusters at the same time if you want concurrency?
Each step is processed concurrently across the cluster. So really if you have work that can be done concurrently, you might consider having it all in the same step (each step can have 1 or more Hadoop jobs).
Typically you might use steps when you want to make sure that ALL processing that needs to be done for the following step is completed before moving to the next step. A good example of this might be when you are dealing with encrypted data, where to might have one step to decrypt the data, one step to process the data, and an additional step to re-encrypt the data before persistence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With