Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Glue: get job_id from within the script using pyspark

I am trying to access the AWS ETL Glue job id from the script of that job. This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf0248150067f2. How do I get it programmatically with pyspark?

like image 205
Zeitgeist Avatar asked Mar 15 '18 13:03

Zeitgeist


1 Answers

As it's documented in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html, it's passed in as a command line argument to the Glue Job. You can access the JOB_RUN_ID and other default/reserved or custom job parameters using getResolvedOptions() function.

import sys
from awsglue.utils import getResolvedOptions

args = getResolvedOptions(sys.argv)
job_run_id = args['JOB_RUN_ID']

NOTE: JOB_RUN_ID is a default identity parameter, we don't need to include it as part of options (the second argument to getResolvedOptions()) for getting its value during runtime in a Glue Job.

like image 197
Brett Avatar answered Sep 18 '22 01:09

Brett