I am trying to access the AWS ETL Glue job id from the script of that job. This is the RunID that you can see in the first column in the AWS Glue Console, something like jr_5fc6d4ecf0248150067f2
. How do I get it programmatically with pyspark?
As it's documented in https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html, it's passed in as a command line argument to the Glue Job. You can access the JOB_RUN_ID
and other default/reserved or custom job parameters using getResolvedOptions()
function.
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv)
job_run_id = args['JOB_RUN_ID']
NOTE: JOB_RUN_ID
is a default identity parameter, we don't need to include it as part of options
(the second argument to getResolvedOptions()
) for getting its value during runtime in a Glue Job.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With