Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to debug a scala based Spark program on Intellij IDEA

I am currently building my development IDE using Intellij IDEA. I followed exactly the same way as http://spark.apache.org/docs/latest/quick-start.html

build.sbt file

name := "Simple Project"

version := "1.0"

scalaVersion := "2.11.7"

 libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"

Sample Program File

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object MySpark {

    def main(args: Array[String]){
        val logFile = "/IdeaProjects/hello/testfile.txt" 
        val conf = new SparkConf().setAppName("Simple Application")
        val sc = new SparkContext(conf)
        val logData = sc.textFile(logFile, 2).cache()
        val numAs = logData.filter(line => line.contains("a")).count()
        val numBs = logData.filter(line => line.contains("b")).count()
        println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
    }
}

If I use command line:

sbt package

and then

spark-submit --class "MySpark" --master local[4] target/scala-2.11/myspark_2.11-1.0.jar

I am able to generate jar package and spark runs well.

However, I want to use Intellij IDEA to debug the program in the IDE. How can I setup the configuration, so that if I click "debug", it will automatically generate the jar package and automatically launch the task by executing "spark-submit-" command line.

I just want everything could be simple as "one click" on the debug button in Intellij IDEA.

Thanks.

like image 320
lserlohn Avatar asked Oct 05 '16 23:10

lserlohn


People also ask

How do I debug a Scala spark?

In order to start the application, select the Run -> Debug SparkLocalDebug, this tries to start the application by attaching to 5005 port. Now you should see your spark-submit application running and when it encounter debug breakpoint, you will get the control to IntelliJ.

How do I run Scala spark in IntelliJ?

Use IntelliJ to create applicationStart IntelliJ IDEA, and select Create New Project to open the New Project window. Select Apache Spark/HDInsight from the left pane. Select Spark Project (Scala) from the main window.

How do you debug your spark application?

Simply start spark with the above command, then select the IntelliJ run configuration you just created and click Debug. IntelliJ should connect to your Spark application, which should now start running. You can set break points, inspect variables, etc.


1 Answers

First define environment variable like below

export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777 

Then create the Debug configuration in Intellij Idea as follows

Rub-> Edit Configuration -> Click on "+" left top cornor -> Remote -> set port and name

After above configuration run spark application with spark-submit or sbt run and then run debug which is created in configuration. and add checkpoints for debug.

like image 77
Sandeep Purohit Avatar answered Sep 21 '22 06:09

Sandeep Purohit