Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Mesos Cluster Mode using Dispatcher

I have only a single machine and want to run spark jobs with mesos cluster mode. It might make more sense to run with a cluster of nodes, but I mainly want to test out mesos first to check if it's able to utilize resources more efficiently (run multiple spark jobs at the same time without static partitioning). I have tried a number of ways but without success. Here is what I did:

  1. Build mesos and run both mesos master and slaves (2 slaves in same machines).

    sudo ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
    sudo ./bin/mesos-slave.sh --master=127.0.0.1:5050 --port=5051 --work_dir=/tmp/mesos1
    sudo ./bin/mesos-slave.sh --master=127.0.0.1:5050 --port=5052 --work_dir=/tmp/mesos2
    
  2. Run the spark-mesos-dispatcher

    sudo ./sbin/start-mesos-dispatcher.sh --master mesos://localhost:5050
    
  3. The submit the app with dispatcher as master url.

    spark-submit  --master mesos://localhost:7077 <other-config> <jar file>
    

But it doesnt work:

    E0925 17:30:30.158846 807608320 socket.hpp:174] Shutdown failed on fd=61: Socket is not connected [57]
    E0925 17:30:30.159545 807608320 socket.hpp:174] Shutdown failed on fd=62: Socket is not connected [57]

If I use spark-submit --deploy-mode cluster, then I got another error message:

    Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server

It work perfectly if I don't use dispatcher but using mesos master url directly: --master mesos://localhost:5050 (client mode). According to the documentation , cluster mode is not supported for Mesos clusters, but they give another instruction for cluster mode here. So it's kind of confusing? My question is:

  1. How I can get it works?
  2. Should I use client mode instead of cluster mode if I submit the app/jar directly from the master node?
  3. If I have a single computer, should I spawn 1 or more mesos slave processes. Basically, I have a number of spark job and dont want to do static partitioning of resources. But when using mesos without static partitioning, it seems to be much slower?

Thanks.

like image 401
auxdx Avatar asked Sep 25 '15 09:09

auxdx


People also ask

How do you use Mesos in Spark?

Connecting Spark to Mesos. To use Mesos from Spark, you need a Spark binary package available in a place accessible by Mesos, and a Spark driver program configured to connect to Mesos. Alternatively, you can also install Spark in the same location in all the Mesos agents, and configure spark. mesos.

Which cluster manager is best for Spark?

Apache Mesos: A general manager that can also run Hadoop MapReduce and service applications. It is a distributed cluster manager that can manage resources per application. We can easily run spark jobs, Hadoop MapReduce, or any other service applications. Apache has API for most programming languages.

How will you start Mesos Shuffle services?

Dynamic Resource Allocation with Mesos It provides shuffle data cleanup functionality on top of the Shuffle Service since Mesos doesn't yet support notifying another framework's termination. To launch it, run $SPARK_HOME/sbin/start-mesos-shuffle-service.sh on all slave nodes, with spark.

How do I run Spark in standalone mode?

To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster. You can obtain pre-built versions of Spark with each release or build it yourself.


1 Answers

There seem to be two things you're confusing: launching a Spark application in a cluster (as opposed to locally) and launching the driver into the cluster.

From the top of Submitting Applications:

The spark-submit script in Spark’s bin directory is used to launch applications on a cluster. It can use all of Spark’s supported cluster managers through a uniform interface so you don’t have to configure your application specially for each one.

So, Mesos is one of the supported cluster managers and hence you can run Spark apps on a Mesos cluster.

What Mesos as time of writing does not support is launching the driver into the cluster, this is what the command line argument --deploy-mode of ./bin/spark-submitspecifies. Since the default value of --deploy-mode is client you can just omit it, or if you want to explicitly specify it, then use:

./bin/spark-submit --deploy-mode client ...
like image 56
Michael Hausenblas Avatar answered Oct 10 '22 20:10

Michael Hausenblas