I'm using AWS Batch and have started using Array Jobs.
AWS_BATCH_JOB_ARRAY_INDEX is passed as an Environment Variable to the container.
Is the array size passed in some way? It is mandatory to know whether the index was related to 5 jobs or 1000 jobs. Currently I'm passing it as my own environment variable but thought that that info would be passed to the container in some way already.
This is not possible at the moment. I've made a feature request for it, which you can upvote here: https://github.com/aws/containers-roadmap/issues/1631
In the meantime, I found a hacky workaround. The job ID for array workers appears to conform to $PARENT_JOB_ID:$AWS_BATCH_JOB_ARRAY_INDEX. So, to the extent that you can rely on this formatting of array worker IDs, you can describe the parent job and get the total array size from there. Here's an example using boto3:
import os
import boto3
worker_job_id = os.environ['AWS_BATCH_JOB_ID']
parent_job_id = worker_job_id.split(":")[0]
response = boto3.client('batch').describe_jobs(jobs=[parent_job_id])
parent_job = response['jobs'][0]
array_size = parent_job.get('arrayProperties', {}).get("size")
print("array_size =", array_size)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With