When I read spark-1.6 source code of the Master class, the receiveAndReply method seems not to be using Akka. [Cf. here.]
Why is it not using Akka ? And What did they replace Akka with ?
Spark uses Akka basically for scheduling. All the workers request for a task to master after registering. The master just assigns the task. Here Spark uses Akka for messaging between the workers and masters.
Apache Spark is an open-source, distributed processing system used for big data workloads. It utilizes in-memory caching, and optimized query execution for fast analytic queries against data of any size.
When would you not want to use Spark? For multi-user systems, with shared memory, Hive may be a better choice ². For real time, low latency processing, you may prefer Apache Kafka ⁴. With small data sets, it’s not going to give you huge gains, so you’re probably better off with the typical libraries and tools.
Spark has been called a “general purpose distributed data processing engine”1 and “a lightning fast unified analytics engine for big data and machine learning” ². It lets you process big data sets faster by splitting the work up into chunks and assigning those chunks across computational resources.
What is Spark? Spark has been called a “general purpose distributed data processing engine”1 and “a lightning fast unified analytics engine for big data and machine learning” ². It lets you process big data sets faster by splitting the work up into chunks and assigning those chunks across computational resources.
Some of the most common options to set are: The name of your application. This will appear in the UI and in log data. Number of cores to use for the driver process, only in cluster mode. Limit of total size of serialized results of all partitions for each Spark action (e.g. collect).
The motivation behind making Spark independent from Akka are well described in SPARK-5293 which is an umbrella task for Akka related issues.
To quote original description:
Spark depends on Akka, [so] it is not possible for users to rely on different versions, and we have received many requests in the past asking for help about this specific issue. For example, Spark Streaming might be used as the receiver of Akka messages - but our dependency on Akka requires the upstream Akka actors to also use the identical version of Akka.
Since our usage of Akka is limited (mainly for RPC and single-threaded event loop), we can replace it with alternative RPC implementations and a common event loop in Spark.
As you can see, the main reason is simple - to give users more flexibility in creating their own applications.
Also removing complex dependency like Akka, which hasn't been used extensively by Spark anyway, which means lower cost of maintenance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With