Can someone help me understand why people is using scala over Java for spark? I have been researching but haven't been able to find a solid answer, I know both works fine as they both run on JVM and I know scala us functional and OOP language.
Thanks
From the Spark context, many experts prefer Scala over other programming languages as Spark is written in Scala. Scala is the native language of Spark. It means any new API always first be available in Scala.
The study concluded that Scala was faster than Java and Go when average developers write their code without thinking about optimization too much. The study used the default, idiomatic data structures in each language.
Spark is written in Scala Scala is not only Spark's programming language, but it's also scalable on JVM. Scala makes it easy for developers to go deeper into Spark's source code to get access and implement all the framework's newest features.
For advancing in programming skills, it's good to learn at least one language from different paradigms, like imperative, logical, functional, and OOP, and Scala gives you a chance to explore both functional and oop together. The Pragmatic Programmer book also advised you to learn a new programming language every year.
Spark was written in Scala. Spark also came out before Java 8 was available which made functional programming more cumbersome. Also, Scala is closer to Python while still running in a JVM. Data Scientists were the original target users for Spark. Data Scientists would traditionally have more of a background in Python, so Scala make more sense for them to use then go straight to Java
Here is direct quote from one of the guys who wrote initially wrote spark from a reddit AMA they did. The question was:
Q:
How important was it to create Spark in Scala? Would it have been feasible / realistic to write it in Java or was Scala fundamental to Spark?
A from Matei Zahara:
At the time we started, I really wanted a PL that supports a language-integrated interface (where people write functions inline, etc), because I thought that was the way people would want to program these applications after seeing research systems that had it (specifically Microsoft's DryadLINQ). However, I also wanted to be on the JVM in order to easily interact with the Hadoop filesystem and data formats for that. Scala was the only somewhat popular JVM language then that offered this kind of functional syntax and was also statically typed (letting us have some control over performance), so we chose that. Today there might be an argument to make the first version of the API in Java with Java 8, but we also benefitted from other aspects of Scala in Spark, like type inference, pattern matching, actor libraries, etc.
Edit
Heres the link incase folks were interested in more on what Matei had to say: https://www.reddit.com/r/IAmA/comments/31bkue/im_matei_zaharia_creator_of_spark_and_cto_at/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With