In my Glue job, I have enabled Spark UI and specified all the necessary details (s3 related etc.) needed for Spark UI to work.
How can I view the DAG/Spark UI of my Glue job?
Click Analytics > Spark Analytics > Open the Spark Application Monitoring Page. Click Monitor > Workloads, and then click the Spark tab. This page displays the user names of the clusters that you are authorized to monitor and the number of applications that are currently running in each cluster.
Enable spark UI option in glue jobs. Specify the s3 path where the logs will be generated. Start a Spark History Server using docker and EC2. Access spark UI on the History server. Create a new job and in the monitoring section enable the spark UI option and provide an s3 path for logs generation.
Now try to access the Spark web UI. Go to your EC2 instance and copy the Public IPv4 address. add port: 18080 at the end of it and paste it in a new tab. The history server will show the spark UI for the glue jobs. If you have successful logs Great! you’ve done it.
What is AWS Glue? You can use the Apache Spark web UI to monitor and debug AWS Glue ETL jobs running on the AWS Glue job system, and also Spark applications running on AWS Glue development endpoints. The Spark UI enables you to check the following for each job:
You can still use AWS Glue continuous logging to view the Spark application log streams for Spark driver and executors. For more information, see Continuous Logging for AWS Glue Jobs. Did this page help you?
You need to setup an ec2 instance that can host the history server.
The below documentation has links to CloudFormation templates that you can use. https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-history.html
You can access the history server via the ec2 instance(default on 18080). You need to configure the networks and ports suitably.
EDIT - There is also an option to setup SparkUI locally. This requires downloading the docker image from aws-glue-samples repo amd settin the AWS credential and s3 location there. This server consummes the files that the glue job generates. The files are about 4MB large.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With