AWS Glue: get job_id from within the script using pyspark

Question

I am trying to access the AWS ETL Glue job id from the script of that job. This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf0248150067f2. How do I get it programmatically with pyspark?

Brett · Accepted Answer

As it's documented in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html, it's passed in as a command line argument to the Glue Job. You can access the JOB_RUN_ID and other default/reserved or custom job parameters using getResolvedOptions() function.

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv)
job_run_id = args['JOB_RUN_ID']

NOTE: JOB_RUN_ID is a default identity parameter, we don't need to include it as part of options (the second argument to getResolvedOptions()) for getting its value during runtime in a Glue Job.

AWS Glue: get job_id from within the script using pyspark

Tags:

amazon-web-services

aws-glue

Zeitgeist

1 Answers

Brett

Recent Activity

Donate For Us

AWS Glue: get job_id from within the script using pyspark

Tags:

amazon-web-services

aws-glue

Zeitgeist

1 Answers

Brett

Related questions

Recent Activity

Donate For Us