Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark UI on AWS EMR

I am running a AWS EMR cluster with Spark (1.3.1) installed via the EMR console dropdown. Spark is current and processing data but I am trying to find which port has been assigned to the WebUI. I've tried port forwarding both 4040 and 8080 with no connection. I'm forwarding like so

ssh -i ~/KEY.pem -L 8080:localhost:8080 hadoop@EMR_DNS

1) How do I find out what the Spark WebUI's assigned port is? 2) How do I verify the Spark WebUI is running?

like image 651
gallamine Avatar asked Jul 16 '15 16:07

gallamine


People also ask

How do I get Spark UI AWS EMR?

You can view the Spark web UIs by following the procedures to create an SSH tunnel or create a proxy in the section called Connect to the cluster in the Amazon EMR Management Guide and then navigating to the YARN ResourceManager for your cluster.

Can Spark MLlib run on EMR?

We've found great success using popular open source frameworks like Spark and MLlib to learn models at massive scale. The advantages of using these tools are further amplified by relying on AWS and EMR, specifically, to create and manage our clusters.

Can we run PySpark on EMR?

You can use AWS Step Functions to run PySpark applications as EMR Steps on an existing EMR cluster. Using Step Functions, we can also create the cluster, run multiple EMR Steps sequentially or in parallel, and finally, auto-terminate the cluster.

How does EMR work with Spark?

Integration with Amazon EMR feature set Submit Apache Spark jobs with the EMR Step API, use Spark with EMRFS to directly access data in S3, save costs using EC2 Spot capacity, use EMR Managed Scaling to dynamically add and remove capacity, and launch long-running or transient clusters to match your workload.


2 Answers

Here is an alternative if you don't want to deal with the browser setup with SOCKS as suggested on the EMR docs.

  1. Open a ssh tunnel to the master node with port forwarding to the machine running spark ui

    ssh -i path/to/aws.pem  -L 4040:SPARK_UI_NODE_URL:4040 hadoop@MASTER_URL
    

    MASTER_URL (EMR_DNS in the question) is the URL of the master node that you can get from EMR Management Console page for the cluster

    SPARK_UI_NODE_URL can be seen near the top of the stderr log. The log line will look something like:

    16/04/28 21:24:46 INFO SparkUI: Started SparkUI at http://10.2.5.197:4040
    
  2. Point your browser to localhost:4040

Tried this on EMR 4.6 running Spark 2.6.1

like image 28
ud3sh Avatar answered Oct 09 '22 04:10

ud3sh


Spark on EMR is configured for YARN, thus the Spark UI is available by the application url provided by the YARN Resource Manager (http://spark.apache.org/docs/latest/monitoring.html). So the easiest way to get to it is to setup your browser with SOCKS using a port opened by SSH then from the EMR console open Resource Manager and click the Application Master URL provided to the right of the running application. Spark History server is available at the default port 18080.

Example of socks with EMR at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-web-interfaces.html

like image 142
ChristopherB Avatar answered Oct 09 '22 02:10

ChristopherB