Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scala vs java for Spark? [closed]

Can someone help me understand why people is using scala over Java for spark? I have been researching but haven't been able to find a solid answer, I know both works fine as they both run on JVM and I know scala us functional and OOP language.

Thanks

like image 975
Al Elizalde Avatar asked Jan 11 '16 23:01

Al Elizalde


People also ask

Which is better Scala or Java for Spark?

From the Spark context, many experts prefer Scala over other programming languages as Spark is written in Scala. Scala is the native language of Spark. It means any new API always first be available in Scala.

Is Scala more efficient than Java?

The study concluded that Scala was faster than Java and Go when average developers write their code without thinking about optimization too much. The study used the default, idiomatic data structures in each language.

Why Scala is preferred for Spark?

Spark is written in Scala Scala is not only Spark's programming language, but it's also scalable on JVM. Scala makes it easy for developers to go deeper into Spark's source code to get access and implement all the framework's newest features.

Should I learn Scala 2022?

For advancing in programming skills, it's good to learn at least one language from different paradigms, like imperative, logical, functional, and OOP, and Scala gives you a chance to explore both functional and oop together. The Pragmatic Programmer book also advised you to learn a new programming language every year.


1 Answers

Spark was written in Scala. Spark also came out before Java 8 was available which made functional programming more cumbersome. Also, Scala is closer to Python while still running in a JVM. Data Scientists were the original target users for Spark. Data Scientists would traditionally have more of a background in Python, so Scala make more sense for them to use then go straight to Java

Here is direct quote from one of the guys who wrote initially wrote spark from a reddit AMA they did. The question was:

Q:

How important was it to create Spark in Scala? Would it have been feasible / realistic to write it in Java or was Scala fundamental to Spark?

A from Matei Zahara:

At the time we started, I really wanted a PL that supports a language-integrated interface (where people write functions inline, etc), because I thought that was the way people would want to program these applications after seeing research systems that had it (specifically Microsoft's DryadLINQ). However, I also wanted to be on the JVM in order to easily interact with the Hadoop filesystem and data formats for that. Scala was the only somewhat popular JVM language then that offered this kind of functional syntax and was also statically typed (letting us have some control over performance), so we chose that. Today there might be an argument to make the first version of the API in Java with Java 8, but we also benefitted from other aspects of Scala in Spark, like type inference, pattern matching, actor libraries, etc.

Edit

Heres the link incase folks were interested in more on what Matei had to say: https://www.reddit.com/r/IAmA/comments/31bkue/im_matei_zaharia_creator_of_spark_and_cto_at/

like image 51
Joe Widen Avatar answered Sep 17 '22 06:09

Joe Widen