Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are AWS Batch Jobs stuck in RUNNABLE?

I use a computing environment of 0-256 m3.medium on demand instances. My Job definition requires 1 CPU and 3 GB of Ram, which m3.medium has.

What are possible reasons why AWS Batch Jobs are stuck in state RUNNABLE?

AWS says:

A job that resides in the queue, has no outstanding dependencies, and is therefore ready to be scheduled to a host. Jobs in this state are started as soon as sufficient resources are available in one of the compute environments that are mapped to the job’s queue. However, jobs can remain in this state indefinitely when sufficient resources are unavailable.

but that does not answer my question

like image 282
arm Avatar asked Jan 08 '18 13:01

arm


People also ask

What does runnable mean in AWS batch?

Short description. AWS Batch moves a job to RUNNABLE status when the job has no outstanding dependencies and is ready to be scheduled to a host. RUNNABLE jobs are started as soon as sufficient resources are available in one of the compute environments that are mapped to the job's queue.

Where do AWS batch jobs run?

Jobs. A unit of work (such as a shell script, a Linux executable, or a Docker container image) that you submit to AWS Batch. It has a name, and runs as a containerized application on AWS Fargate or Amazon EC2 resources in your compute environment, using parameters that you specify in a job definition.

How does AWS batch simplify the batch computing process?

AWS Batch provisions compute resources and optimizes the job distribution based on the volume and resource requirements of the submitted batch jobs. AWS Batch dynamically scales compute resources to any quantity required to run your batch jobs, freeing you from the constraints of fixed-capacity clusters.

Is AWS batch scalable?

Batch will scale up your EC2 Accelerated Instances when you need them, and scale them down when you're done, allowing you to focus on your applications.


2 Answers

There are other reasons why a Job can get stuck in RUNNABLE:

  • Insufficient permissions for the role associated to the Computed Environment
  • No internet access from the Compute Environment instance. You will need to associate a NAT or Internet Gateway to the Compute Environment subnet.
    • Make sure to check the "Enable auto-assign public IPv4 address" setting on your Compute Environment's subnet. (Pointed out by @thisisbrians in the comments)
  • Problems with your image. You need to use an ECS optimized AMI or make sure you have the ECS container agent working. More info at aws docs
  • You're trying to launch instances for which you account is limited to 0 instances (EC2 console > limits, in the left menu). (Read more on gergely-danyi comment)
  • And as mentioned insufficient resources

Also, make sure to read the AWS Batch troubleshooting

like image 193
nachoab Avatar answered Oct 08 '22 00:10

nachoab


The roles should be defined using, at least, the next policies and trusted relationships. If not, they will get stuck in RUNNABLE as they don't have the enough privileges to start:

 AWSBatchServiceRole

  • Attached policies: AWSBatchServiceRole
  • Trusted relationship: batch.amazonaws.com

    {   "Version": "2012-10-17",   "Statement": [     {       "Effect": "Allow",       "Principal": {          "Service": "batch.amazonaws.com"        },       "Action": "sts:AssumeRole"     }   ] } 

ecsInstanceRole

  • Attached policies: AmazonEC2ContainerServiceforEC2Role
  • Trusted relationship: ec2.amazonaws.com

    {   "Version": "2012-10-17",   "Statement": [     {       "Effect": "Allow",       "Principal": {          "Service": "ec2.amazonaws.com"        },       "Action": "sts:AssumeRole"     }   ] } 
like image 28
Pau Avatar answered Oct 07 '22 23:10

Pau