Would someone be able provide an example of what an AWS Cloudformation AWS::GLUE::WORKFLOW template would look like?

Tags:

I have been searching for an example of how to set up Cloudformation for a glue workflow which includes triggers, jobs, and crawlers, but I haven't been able to find much information on it.

This is the only piece of information I am able to find from AWS

{
  "Type" : "AWS::Glue::Workflow",
  "Properties" : {
      "DefaultRunProperties" : Json,
      "Description" : String,
      "Name" : String,
      "Tags" : Json
    }
}

735

asked Oct 08 '19 21:10

Travis Brannan

2 Answers

Here's an example of a workflow with one crawler and a job to be run after the crawler finishes.

It is defined through tagging the triggers with the WorkflowName.

I believe there can be only one SCHEDULED or ON_DEMAND trigger to start the workflow. All the other triggers in the workflow need to be CONDITIONAL on the jobs / crawlers. That's probably how CloudFormation knows how to build the DAG.

Also see how the workflow parameters are defined as a json in the DefaultRunProperties.

---
AWSTemplateFormatVersion: '2010-09-09'

Parameters:
  BaseBucket:
    Description: Bucket used by my workflow jobs
    Type: String

Resources:
  MyWorkflow:
    Type: AWS::Glue::Workflow
    Properties: 
      DefaultRunProperties:
        {
          "workflowParameter1": "Foo",
          "workflowParameter2": "Bar",
          "bucket": { "Fn::Sub": "${BaseBucket}" }
        }
      Description: Workflow for orchestrating my jobs
      Name: MyWorkflowName

  WorkflowCrawler:
    Type: AWS::Glue::Crawler
    Properties:
      Name: MyCrawler
      Role: MyCrawlerRole
      Description: A crawler to run as the first step in the workflow
      DatabaseName: MyDatabase
      Targets:
        S3Targets:
          - Path: !Sub "s3://${BaseBucket}/"

  WorkflowJob:
    Type: AWS::Glue::Job
    Properties:
      Description: Glue job to run after the crawler
      Name: MyWorkflowJob
      Role: MyJobRole
      Command:
        Name: pythonshell
        PythonVersion: 3
        ScriptLocation: !Sub "s3://${BaseBucket}/my_workflow_job_script.py"

  WorkflowStartTrigger:
    Type: AWS::Glue::Trigger
    Properties:
      Name: StartTrigger
      Type: ON_DEMAND
      Description: Trigger for starting the workflow
      Actions:
        - CrawlerName: !Ref WorkflowCrawler
      WorkflowName: !Ref MyWorkflow

  WorkflowJobTrigger:
    Type: AWS::Glue::Trigger
    Properties:
      Name: CrawlerSuccessfulTrigger
      Type: CONDITIONAL
      StartOnCreation: True
      Description: Trigger to start the glue job
      Actions:
        - JobName: !Ref WorkflowJob
      Predicate:
        Conditions:
          - LogicalOperator: EQUALS
            CrawlerName: !Ref WorkflowCrawler
            CrawlState: SUCCEEDED
      WorkflowName: !Ref MyWorkflow

127

answered Sep 28 '22 17:09

antti

Here is an example of a Glue workflow using triggers, crawlers and a job to convert JSON to Parquet:

JSONtoParquetWorkflow:
  Type: AWS::Glue::Workflow
  Properties: 
    Name: json-to-parquet-workflow
    Description: Workflow for orchestrating JSON to Parquet conversion
RawJSONCrawlerTrigger:
  Type: AWS::Glue::Trigger
  Properties:
    WorkflowName: !Ref JSONtoParquetWorkflow
    Name: raw-json-crawler-trigger
    Description: Start crawler for raw JSON data
    Type: ON_DEMAND
    Actions:
      - CrawlerName: !Ref RawJSONCrawler
JSONToParquetETLJobTrigger:
  Type: AWS::Glue::Trigger
  Properties:
    WorkflowName: !Ref JSONtoParquetWorkflow
    Name: json-to-parquet-etl-trigger
    Description: Start JSON to Parquet ETL job
    Type: CONDITIONAL
    StartOnCreation: True
    Predicate:
      Conditions:
        - LogicalOperator: EQUALS
          CrawlerName: !Ref RawJSONCrawler
          CrawlState: SUCCEEDED
    Actions:
      - JobName: !Ref JSONToParquetETLJob
RawParquetCrawlerTrigger:
  Type: AWS::Glue::Trigger
  Properties:
    WorkflowName: !Ref JSONtoParquetWorkflow
    Name: raw-parquet-crawler-trigger
    Description: Start crawler for raw Parquet data
    Type: CONDITIONAL
    StartOnCreation: True
    Predicate:
      Conditions:
        - LogicalOperator: EQUALS
          JobName: !Ref JSONToParquetETLJob
          State: SUCCEEDED
    Actions:
      - CrawlerName: !Ref RawParquetCrawler

answered Sep 28 '22 15:09

abk

Related questions
                            
                                Best practice for reading data from Kafka to AWS Redshift
                            
                                Facebook/Google-only logins (no username/pwd) with AWS Cognito and React
                            
                                How to install python packages like pip, numpy on Amazon EC2 - ubuntu
                            
                                Authenticate Apollo Client to AWS AppSync with Cognito User Pools
                            
                                SES how to get ARN of identity?
                            
                                Operating the Celery Worker in the ECS Fargate
                            
                                AWS lambda CLI 'update-function-code' does not update lambda_handler file
                            
                                AWS Lambda: How To Remove Environmental Variables from Configuration
                            
                                DynamoDB - AWS CLI - batch-write-item only inserts one row
                            
                                aws cognito list-users function only returns 60 users
                            
                                AWS CodeBuild Branch filter option removed
                            
                                Cannot connect to internet-facing NLB forwarding traffic to a private instance
                            
                                How to import python file as module in Jupyter notebook?
                            
                                How to access a public S3 bucket from another AWS account?
                            
                                AWS API Gateway + Cognito + Lambda - $context.authorizer.principalId empty
                            
                                Amazon EC2 Instance Connect for SSH - security group?
                            
                                Access to the resource https://sqs.us-east-1.amazonaws.com/ is denied
                            
                                Angular S3 Static Website - 403 Forbidden Routing Error
                            
                                How to Trigger Glue ETL Pyspark job through S3 Events or AWS Lambda?
                            
                                add S3 trigger on a Lambda function with cloudformation yaml

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Would someone be able provide an example of what an AWS Cloudformation AWS::GLUE::WORKFLOW template would look like?

Tags:

amazon-web-services

amazon-cloudformation

aws-glue

Travis Brannan

People also ask

2 Answers

antti

abk

Recent Activity

Donate For Us