Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala IDE for data science applications (like RStudio / Spyder / Rodeo)

With the rise of Spark, Scala has gained a tremendous momentum as programming language of choice for data science applications.

To increase the efficiency when working on data science applications, specialized IDEs have been released for

  • R (e.g. RStudio) and
  • Python (e.g. Spyder or Rodeo, see Is there something like RStudio for Python?).

Is there something similar for Scala?

like image 494
majom Avatar asked Dec 06 '16 11:12

majom


People also ask

Which Python IDE is similar to RStudio?

Features Spyder contains features like a text editor with syntax highlighting, code completion and variable exploring, which you can edit its values using a Graphical User Interface (GUI). “If you are switching from Matlab or Rstudio to Python; Spyder is the way to go, It very intuitive for scientific computing.”


2 Answers

Unfortunately there doesn't seem to be any dedicated Data Science IDEs for Scala at this time. I think these would be your best options:

IntelliJ Worksheets:

IntelliJ Worksheet This is basically a text editor with an output window which gets updated as often as you want. Eclipse has something similar, I just prefer IntelliJ.

Pros:

  • Backed by IntelliJ's fantastic code completion, error checking, and sbt/maven integration.
  • You can prototype within the same project setup as your actual development system (if you have one).

Cons:

  • I am not aware of any caching/selective evaluation so the entire worksheet is evaluated each time you want an answer, something you may not want if you have some operations which take a long time to complete.
  • No workspace variables window or plot integration.

Jupyter Notebooks

Jupyter Notebook The Jupyter Notebook is a generalization of the iPython notebook which now supports dozens of interpreted languages (new kernels are being added all of the time).

Pros:

  • Scala and Spark Scala Kernels are fairly easy to install, both have the ability to add maven/sbt dependencies and JARs.
  • The cells in the notebook can be run individually (allowing you to train a model once and use it many times, for example).
  • The cells support markdown (with LaTeX!) which can be rendered on its own (a github example), allowing you to use your notebooks as a report/demonstration.
  • Notebooks are backed by a Notebook Server so you could easily use a more powerful computer as your notebook server and then interact with the notebook from another location.
  • Some kernels have autocompletion.
  • Looks like there is some plot integration (example) but it is not totally polished.

Cons:

  • Not all kernels are perfect, some have bugs or limited functionality.
  • No workspace variables window.
  • You really need to be careful about the ordering of your cells, failure to do so can cause a lot of confusion.

For most of the data-sciency stuff I do I use Jupyter but it is far from perfect. In order for Scala to really take over as a Data Science language it really needs more data science libraries (scikit-learn is sooo far ahead here) and it needs a solid plotting library (there are a few options but none I have seen both use idiomatic Scala and are able to run without a server). I think as soon as it has those two elements it will become more popular and hopefully someone will make a nice RStudio-esque IDE.

like image 109
evan.oman Avatar answered Dec 18 '22 22:12

evan.oman


Your best shot (nothing like rstudio but this would be your best shot for scala) is apache zeppelin

zeppelin

like image 30
Tomer Ben David Avatar answered Dec 18 '22 21:12

Tomer Ben David