Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

YARN vs Spark processing engine based on real time application?

I understood YARN and Spark. But I want to know when I need to use Yarn and Spark processing engine. What are the different case studies in that I can identify the difference between YARN and Spark?

like image 439
chandu kavar Avatar asked Nov 28 '22 16:11

chandu kavar


2 Answers

You cannot compare Yarn and Spark directly per se. Yarn is a distributed container manager, like Mesos for example, whereas Spark is a data processing tool. Spark can run on Yarn, the same way Hadoop Map Reduce can run on Yarn. It just happens that Hadoop Map Reduce is a feature that ships with Yarn, when Spark is not.

If you mean comparing Map Reduce and Spark, I suggest reading this other answer.

like image 173
matthieun Avatar answered Dec 05 '22 09:12

matthieun


Apache Spark can be run on YARN, MESOS or StandAlone Mode.

Spark in StandAlone mode - it means that all the resource management and job scheduling are taken care Spark inbuilt.

Spark in YARN - YARN is a resource manager introduced in MRV2, which not only supports native hadoop but also Spark, Kafka, Elastic Search and other custom applications.

Spark in Mesos - Spark also supports Mesos, this is one more type of resource manager.

Advantages of Spark on YARN

  • YARN allows you to dynamically share and centrally configure the same pool of cluster resources between all frameworks that run on YARN.
  • YARN schedulers can be used for spark jobs, Only With YARN, Spark can run against Kerberized Hadoop clusters and uses secure authentication between its processes.

Link for more documentation on YARN, Spark.

We can conclude saying this, if you want to build a small and simple cluster independent of everything go for standalone. If you want to use existing hadoop cluster go for YARN/Mesos.

like image 37
Karthik Avatar answered Dec 05 '22 10:12

Karthik