Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to set a walltime on AWS Batch jobs?

Is there a way to set a maximum running time for AWS Batch jobs (or queues)? This is a standard setting in most batch managers, which avoids wasting resources when a job hangs for whatever reason.

like image 286
static_rtti Avatar asked Nov 20 '17 09:11

static_rtti


People also ask

Are Spot instances good for batch jobs?

EC2 Spot Instances are unused EC2 capacity available at up to a 90% discount compared to On-Demand prices. As Spot Instances can be reclaimed with a two-minute warning, they are ideal for fault tolerant applications. Since Batch workloads are containerized, Batch is a perfect fit for Spot Instances.

Does AWS batch support customized Ami?

By default, AWS Batch managed compute environments use a recent, approved version of the Amazon ECS optimized AMI for compute resources. However, you might want to consider creating your own AMI to use for your managed and unmanaged compute environments.

How do I schedule a batch file in AWS?

To create a scheduled AWS Batch job with EventBridge Open the Amazon EventBridge console at https://console.aws.amazon.com/events/ . Using the following values, create an EventBridge rule that schedules an AWS Batch job: For Rule type, choose Schedule.

Which component of AWS batch sets parameters for a job?

By default, AWS Batch enables the awslogs log driver. The valid values listed for this parameter are log drivers that the Amazon ECS container agent can communicate with by default. This parameter maps to LogConfig in the Create a container section of the Docker Remote API and the --log-driver option to docker run .


Video Answer


3 Answers

There is no option to set timeout on batch job but you can setup a lambda function that triggers every 1 hour or so and deletes jobs created before say 24 hours.

like image 31
Asdfg Avatar answered Oct 18 '22 13:10

Asdfg


As of April, 2018, AWS Batch now supports setting a Job Timeout when submitting a Job, or in the job definition.

https://aws.amazon.com/about-aws/whats-new/2018/04/aws-batch-adds-support-for-automatic-termination-with-job-execution-timeout/

You specify an attemptDurationSeconds parameter, which must be at least 60 seconds, either in your job definition, or when you submit the job. When this number of seconds has passed following the job attempt's startedAt timestamp, AWS Batch terminates the job. On the compute resource, your job's container receives a SIGTERM signal to give your application a chance to shut down gracefully; if the container is still running after 30 seconds, a SIGKILL signal is sent to forcefully shut down the container.

Source: https://docs.aws.amazon.com/batch/latest/userguide/job_timeouts.html

POST /v1/submitjob HTTP/1.1
Content-type: application/json

{
   ...
   "timeout": { 
      "attemptDurationSeconds": number
   }
}
like image 110
Luke Waite Avatar answered Oct 18 '22 14:10

Luke Waite


AFAIK there is no feature to do this. However, a workaround was suggested in the forum for a similar question.

One idea is to call Batch as an Activity from Step Functions, pingback back on a schedule (e.g. every minute) from that job. If it stops responding then you can detect that situation as a Timeout in the activity and act accordingly (terminate the job etc.). Not an ideal solution (especially if the job continues to ping back as a "zombie"), but it's a start. You'd also likely have to store activity tokens in a database to trace them to Batch job id.

Alternatively, you split that setup into 2 steps, and schedule a Batch job from a Lambda in the first state, then pass the Batch job id to the second step which then polls Batch (from another Lambda) for its state with Retry and IntervalSeconds (e.g. once every minute, or even with exponential backoff), and MaxAttempts calculated based on your timeout. This way, you don't need any external state storage mechanism, long polling or even a "ping back" from the job (it CAN be a zombie), but the downside is more steps.

like image 45
SriniV Avatar answered Oct 18 '22 15:10

SriniV