Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to view AWS Glue Spark UI

In my Glue job, I have enabled Spark UI and specified all the necessary details (s3 related etc.) needed for Spark UI to work.
How can I view the DAG/Spark UI of my Glue job?

like image 969
Ankur Shrivastava Avatar asked Nov 19 '19 14:11

Ankur Shrivastava


People also ask

How do I monitor a Spark job?

Click Analytics > Spark Analytics > Open the Spark Application Monitoring Page. Click Monitor > Workloads, and then click the Spark tab. This page displays the user names of the clusters that you are authorized to monitor and the number of applications that are currently running in each cluster.

How do I enable spark UI in a glue job?

Enable spark UI option in glue jobs. Specify the s3 path where the logs will be generated. Start a Spark History Server using docker and EC2. Access spark UI on the History server. Create a new job and in the monitoring section enable the spark UI option and provide an s3 path for logs generation.

How to access Spark web UI on AWS EC2?

Now try to access the Spark web UI. Go to your EC2 instance and copy the Public IPv4 address. add port: 18080 at the end of it and paste it in a new tab. The history server will show the spark UI for the glue jobs. If you have successful logs Great! you’ve done it.

What is AWS glue?

What is AWS Glue? You can use the Apache Spark web UI to monitor and debug AWS Glue ETL jobs running on the AWS Glue job system, and also Spark applications running on AWS Glue development endpoints. The Spark UI enables you to check the following for each job:

Can I still use AWS glue continuous logging for Spark jobs?

You can still use AWS Glue continuous logging to view the Spark application log streams for Spark driver and executors. For more information, see Continuous Logging for AWS Glue Jobs. Did this page help you?


1 Answers

You need to setup an ec2 instance that can host the history server.

The below documentation has links to CloudFormation templates that you can use. https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-ui-history.html

You can access the history server via the ec2 instance(default on 18080). You need to configure the networks and ports suitably.

EDIT - There is also an option to setup SparkUI locally. This requires downloading the docker image from aws-glue-samples repo amd settin the AWS credential and s3 location there. This server consummes the files that the glue job generates. The files are about 4MB large.

like image 69
jay.cs Avatar answered Oct 14 '22 14:10

jay.cs