I am trying to submit multiple jobs to the EMR cluster but I see only the first one in running state and rest all are in Accepted state. The majority of my jobs are streaming Jobs.
I have the following queries:
I am using Java for development. Any inputs will be really helpful.
To submit a Spark step using the consoleOpen the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/ . In the Cluster List, choose the name of your cluster. Scroll to the Steps section and expand it, then choose Add step.
Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save , collect ) and any tasks that need to run to evaluate that action.
You can start as many clusters as you like. When you get started, you are limited to 20 instances across all your clusters. If you need more instances, complete the Amazon EC2 instance request form.
If the multiple steps in the EMR are not dependent on each other, then you can use the feature called Concurrency
in the EMR to solve your use case. This feature simply means that you can run more than 1 step in parallel at a time.
This feature is there from the EMR version 5.28.0. If you are using the older version than this then you can not use this feature.
While launching the EMR from the AWS console, this feature is termed as 'Concurrency' in the UI. you can choose any number between 1 to 256.
If you are launching the EMR from the AWS CLI, then this feature is termed as 'StepConcurrencyLevel'.
You can read more about this at multiple steps now in EMR and AWS CLI details
To answer your second question about how can I handle schedule jobs?
There are multiple ways to do this. One simplistic way which I can think of is to write a lambda function that spawns this EMR. Now, this lambda function can be scheduled in AWS cloudwatch to run at any frequency that you want (say every 15 minutes or any time interval). You just need to mention a Cron expression which will decide by which frequency this rule would be triggered.
So every time the rule gets triggered, it will execute your lambda function. And your lambda function in turn would spawn the EMR. In this way you can schedule your jobs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With