Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Eclipse remote debug spark-submit

Tags:

apache-spark

I want to do a remote debug of an application which is submitted to Spark using the Command Prompt in Windows.

How could I provide Spark the remote debug port to start.

like image 825
Vawani Avatar asked Oct 04 '17 10:10

Vawani


People also ask

How do I run spark submit in debug mode?

In order to start the application, select the Run -> Debug SparkLocalDebug, this tries to start the application by attaching to 5005 port. Now you should see your spark-submit application running and when it encounter debug breakpoint, you will get the control to IntelliJ.

How do I debug a spark application?

Simply start spark with the above command, then select the IntelliJ run configuration you just created and click Debug. IntelliJ should connect to your Spark application, which should now start running. You can set break points, inspect variables, etc.

What happens when we submit a spark submit?

Once you do a Spark submit, a driver program is launched and this requests for resources to the cluster manager and at the same time the main program of the user function of the user processing program is initiated by the driver program.


1 Answers

you can always do the Remote Debugging using Spark with Java/scala. If you want to check with small dataset with localmode then you can use the step debugger. Have a look on the Java Debugger and how it works first.

Java step debugger

if you want to use the Spark specific Debugging in Remotemode Then check this link it has very nice example with step by step process.

spark-remote-debugging

For debugging the driver you can add the following to your spark-submit command. Then set your remote debugger to connect to the node you launched your driver program on.

--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

You can customize the port according to your need. In this case, start your debugger in listening mode, then start your spark program and wait for the executor to attach to your debugger. It's important to set the number of executors to 1 or multiple executors will all try to connect to your debugger, likely causing problems. check the above example for more details.

If you are using jetbrains then you can use this example Spark remote debugging

like image 159
Indrajit Swain Avatar answered Sep 29 '22 20:09

Indrajit Swain