Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the exact difference between Spark Local and Standalone mode? [duplicate]

Can someone mention the difference regarding these factors

  • Number of nodes / Machines
  • Memory
  • Cores
  • Setup
  • Deployment
  • Advantages of each mode
  • When they should be used
  • Examples if possible

Also if I am running spark locally on single Laptop then is that Local mode or Standalone?

like image 405
Nikhil Redij Avatar asked Nov 29 '22 21:11

Nikhil Redij


1 Answers

There is a huge difference between standalone and local.

Local - means that it runs on your pc locally i.e. not distributed.

Standalone - means that spark will handle resource management.

Standalone, for this I will give you some background so you can better understand what it means. Spark is a distributed application which consume resources i.e. memory cpu and more... lets assume that you run 2 spark applications at the same time, this can cause an error when allocating resources. for example it may happen that the first job consumes all the memory and the second job would fail because he doesn't have memory.

To resolve this issue you need to use some resource manager that will guarantee that your job can run without any problem with resources.

Standalone, means that spark will handle the management of the resources on the cluster. there are also other resource management tools like Yarn or Mesos. Overall you have 3 options for managing resources on the cluster: Mesos, Yarn , Standalone.

I would also mention that on a real Hadoop cluster, spark is not the only application that is running on your cluster, which means it is not the only consumer of resources. you can also run HBase,TEZ, IMPALA. Yarn would help you to allocate resources to all of those applications.

like image 200
David H Avatar answered Apr 18 '23 10:04

David H