Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark app workflow

How do You organize the Spark development workflow?

My way:

  1. Local hadoop/yarn service.
  2. Local spark service.
  3. Intellij on one screen
  4. Terminal with running sbt console
  5. After I change Spark app code, I switch to terminal and run "package" to compile to jar and "submitSpark" which is stb task that runs spark-submit
  6. Wait for exception in sbt console :)

I also tried to work with spark-shell:

  1. Run shell and load previously written app.
  2. Write line in shell
  3. Evaluate it
  4. If it's fine copy to IDE
  5. After few 2,3,4, paste code to IDE, compile spark app and start again

Is there any way to develop Spark apps faster?

like image 855
zie1ony Avatar asked Jun 03 '15 12:06

zie1ony


2 Answers

I develop the core logic of our Spark jobs using an interactive environment for rapid prototyping. We use the Spark Notebook running against a development cluster for that purpose.

Once I've prototyped the logic and it's working as expected, I "industrialize" the code in a Scala project, with the classical build lifecycle: create tests; build, package and create artifacts by Jenkins.

like image 120
maasg Avatar answered Sep 30 '22 16:09

maasg


I found writing scripts and using :load / :copy streamlined things a bit since I didn't need to package anything. If you do use sbt I suggest you start it and use ~ package such that it automatically packages the jar when changes are made. Eventually of course everything will end up in an application jar, this is for prototyping and exploring.

  1. Local Spark
  2. Vim
  3. Spark-Shell
  4. APIs
  5. Console
like image 36
Chris Avatar answered Sep 30 '22 15:09

Chris