I am trying to debug a Spark Application on a cluster using a master and several worker nodes. I have been successful at setting up the master node and worker nodes using Spark standalone cluster manager. I downloaded the spark folder with binaries and use the following commands to setup worker and master nodes. These commands are executed from the spark directory. command for launching master <pre class="prettyprint"><code>./sbin/start-master.sh </code></pre> command for launching worker node <pre class="prettyprint"><code>./bin/spark-class org.apache.spark.deploy.worker.Worker master-URL </code></pre> command for submitting application <pre class="prettyprint"><code>./sbin/spark-submit --class Application --master URL ~/app.jar </code></pre> Now, I would like to understand the flow of control through the Spark source code on the worker nodes when I submit my application(I just want to use one of the given examples that use reduce()). I am assuming I should setup Spark on Eclipse. The Eclipse setup link on the Apache Spark website seems to be broken. I would appreciate some guidance on setting up Spark and Eclipse to enable stepping through Spark source code on the worker nodes. Thanks!

You could run the Spark application in local mode if you just need to debug the logic of your transformations. This can be run in your IDE and you'll be able to debug like any other application: <pre class="prettyprint"><code>val conf = new SparkConf().setMaster("local").setAppName("myApp") </code></pre> You're of course not distributing the problem with this setup. Distributing the problem is as easy as changing the master to point to your cluster.

How to debug Spark application on Spark Standalone?

Tags:

I am trying to debug a Spark Application on a cluster using a master and several worker nodes. I have been successful at setting up the master node and worker nodes using Spark standalone cluster manager. I downloaded the spark folder with binaries and use the following commands to setup worker and master nodes. These commands are executed from the spark directory.

command for launching master

./sbin/start-master.sh

command for launching worker node

./bin/spark-class org.apache.spark.deploy.worker.Worker master-URL

command for submitting application

./sbin/spark-submit --class Application --master URL ~/app.jar

Now, I would like to understand the flow of control through the Spark source code on the worker nodes when I submit my application(I just want to use one of the given examples that use reduce()). I am assuming I should setup Spark on Eclipse. The Eclipse setup link on the Apache Spark website seems to be broken. I would appreciate some guidance on setting up Spark and Eclipse to enable stepping through Spark source code on the worker nodes.

Thanks!

987

asked Mar 17 '15 03:03

RagHaven

2 Answers

It's important to distinguish between debugging the driver program and debugging one of the executors. They require different options passed to spark-submit

For debugging the driver you can add the following to your spark-submit command. Then set your remote debugger to connect to the node you launched your driver program on.

--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

In this example port 5005 was specified, but you may need to customize that if something is already running on that port.

Connecting to an executor is similar, add the following options to your spark-submit command.

--num-executors 1 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=wm1b0-8ab.yourcomputer.org:5005,suspend=n"

Replace the address with your local computer's address. (It's a good idea to test that you can access it from your spark cluster).

In this case, start your debugger in listening mode, then start your spark program and wait for the executor to attach to your debugger. It's important to set the number of executors to 1 or multiple executors will all try to connect to your debugger, likely causing problems.

These examples are for running with sparkMaster set as yarn-client although they may also work when running under mesos. If you're running using yarn-cluster mode you may have to set the driver to attach to your debugger rather than attaching your debugger to the driver, since you won't necessarily know in advance what node the driver will be executing on.

163

answered Sep 30 '22 17:09

whaleberg

You could run the Spark application in local mode if you just need to debug the logic of your transformations. This can be run in your IDE and you'll be able to debug like any other application:

val conf = new SparkConf().setMaster("local").setAppName("myApp")

You're of course not distributing the problem with this setup. Distributing the problem is as easy as changing the master to point to your cluster.

answered Sep 30 '22 16:09

Mauricio Bustos

Related questions
                            
                                Is there a way to create custom postfix completions in IntelliJ?
                            
                                Visual Studio 2015 RC Gulp task runner not detecting tasks
                            
                                Nuget not reinstalling packages
                            
                                How to extract table data from PDF as CSV from the command line?
                            
                                Background Ripple Effect on StandAlone Toolbar items is gone
                            
                                Node.js with Socket.io - Long Polling fails and throws "code":1,"message":"Session ID unknown" response
                            
                                How to prevent Azure webjobs from being swapped in Azure website production <--> staging slots
                            
                                Resize instance types on Container Engine cluster
                            
                                What are the differences between a Scala Future and a Java Future
                            
                                java.util.MissingResourceException: Can't find bundle for base name javax.servlet.LocalStrings, locale es_ES
                            
                                dynamic adding item to NavigationView in Android
                            
                                UI Automator in project with minSdkVersion 9

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With