I'm using Scala 2.11.8
and Spark 2.1.0
. I'm totally new to Scala.
Is there a simple way to add a single line breakpoint, similar to Python:
import pdb; pdb.set_trace()
where I'll be dropped into a Scala shell and I can inspect what's going on at that line of execution in the script? (I'd settle for just the end of the script, too...)
I'm currently starting my scripts like so:
$SPARK_HOME/bin/spark-submit --class "MyClassName" --master local target/scala-2.11/my-class-name_2.11-1.0.jar
Is there a way to do this? Would help debugging immensely.
EDIT: The solutions in this other SO post were not very helpful / required lots of boilerplate + didn't work.
I would recommend one of the following two options:
The basic idea here is that you debug your app like you would if it was just an ordinary piece of code debugged from within your IDE. The Run->Evaluate expression
function allows you to prototype code and you can use most of the debugger's usual variable displays, step (over) etc functionality. However, since you're not running the application from within your IDE, you need to:
For 1, go to Run->Edit configurations
, hit the +
button in the top right hand corner, select remote, and copy the content of the text field under Command line arguments for running remote JVM
(official help).
For 2, you can use the SPARK_SUBMIT_OPTS
environment variable to pass those JVM options, e.g.:
SPARK_SUBMIT_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005" \
$SPARK_HOME/bin/spark-submit --class Main --master "spark://127.0.0.1:7077" \
./path/to/foo-assembly-1.0.0.jar
Now you can hit the debug
button, and set breakpoints etc.
If you're writing more script-style Scala, you may find it helpful to write it in a Zeppelin Spark Scala interpreter. While it's more like Jupyter/IPython notebooks/the ipython
shell than (i
)pdb
, this does allow you to inspect what's going on at runtime. This will also allow you to graph your data etc. I'd start with these docs.
I think the above will only allow debugging code running on the Driver node, not on the Worker nodes (which run your actual map, reduce etc functions). If you for example set a breakpoint inside an anonymous function inside myDataFrame.map{ ... }
, it probably won't be hit, since that's executed on some worker node. However, with e.g. myDataFrame.head
and the evaluate expression functionality I've been able to fulfil most of my debugging needs. Having said that, I've not tried to specifically pass Java options to executors, so perhaps it's possible (but probably tedious) to get it work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With