Spark Mesos Cluster Mode using Dispatcher

Tags:

I have only a single machine and want to run spark jobs with mesos cluster mode. It might make more sense to run with a cluster of nodes, but I mainly want to test out mesos first to check if it's able to utilize resources more efficiently (run multiple spark jobs at the same time without static partitioning). I have tried a number of ways but without success. Here is what I did:

Build mesos and run both mesos master and slaves (2 slaves in same machines).

sudo ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
sudo ./bin/mesos-slave.sh --master=127.0.0.1:5050 --port=5051 --work_dir=/tmp/mesos1
sudo ./bin/mesos-slave.sh --master=127.0.0.1:5050 --port=5052 --work_dir=/tmp/mesos2

Run the spark-mesos-dispatcher

sudo ./sbin/start-mesos-dispatcher.sh --master mesos://localhost:5050

The submit the app with dispatcher as master url.

spark-submit  --master mesos://localhost:7077 <other-config> <jar file>

But it doesnt work:

    E0925 17:30:30.158846 807608320 socket.hpp:174] Shutdown failed on fd=61: Socket is not connected [57]
    E0925 17:30:30.159545 807608320 socket.hpp:174] Shutdown failed on fd=62: Socket is not connected [57]

If I use spark-submit --deploy-mode cluster, then I got another error message:

    Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server

It work perfectly if I don't use dispatcher but using mesos master url directly: --master mesos://localhost:5050 (client mode). According to the documentation , cluster mode is not supported for Mesos clusters, but they give another instruction for cluster mode here. So it's kind of confusing? My question is:

How I can get it works?
Should I use client mode instead of cluster mode if I submit the app/jar directly from the master node?
If I have a single computer, should I spawn 1 or more mesos slave processes. Basically, I have a number of spark job and dont want to do static partitioning of resources. But when using mesos without static partitioning, it seems to be much slower?

Thanks.

401

asked Sep 25 '15 09:09

auxdx

1 Answers

There seem to be two things you're confusing: launching a Spark application in a cluster (as opposed to locally) and launching the driver into the cluster.

From the top of Submitting Applications:

The spark-submit script in Spark’s bin directory is used to launch applications on a cluster. It can use all of Spark’s supported cluster managers through a uniform interface so you don’t have to configure your application specially for each one.

So, Mesos is one of the supported cluster managers and hence you can run Spark apps on a Mesos cluster.

What Mesos as time of writing does not support is launching the driver into the cluster, this is what the command line argument --deploy-mode of ./bin/spark-submitspecifies. Since the default value of --deploy-mode is client you can just omit it, or if you want to explicitly specify it, then use:

./bin/spark-submit --deploy-mode client ...

answered Oct 10 '22 20:10

Michael Hausenblas

Related questions
                            
                                Rerun Scala code with -deprecation using Apache Zeppelin
                            
                                one-hot encode of multiple string categorical features using Spark DataFrames
                            
                                Getting error while reading from S3 server using pyspark : [java.lang.IllegalArgumentException]
                            
                                Spark/k8s: How to run spark submit on Kubernetes with client mode
                            
                                Aggregate while dropping duplicates in pyspark
                            
                                Spark not ignoring empty partitions
                            
                                Low parallelism when running Apache Beam wordcount pipeline on Spark with Python SDK
                            
                                How to run a Spark-java program from command line [closed]
                            
                                Apache Spark Throws java.lang.IllegalStateException: unread block data
                            
                                Spark Standalone Mode multiple shell sessions (applications)
                            
                                Specifying the output file name in Apache Spark
                            
                                Spark - convert string IDs to unique integer IDs
                            
                                Usage of local variables in closures when accessing Spark RDDs
                            
                                How do you read and write from/into different ElasticSearch clusters using spark and elasticsearch-hadoop?
                            
                                How to format data for the spark mlib kmeans clustering algorithm?
                            
                                How to extract complex JSON structures using Apache Spark 1.4.0 Data Frames
                            
                                If the one partition is lost, we can use lineage to reconstruct it. Will the base RDD be loaded again?
                            
                                Use Serializable lambda in Spark JavaRDD transformation
                            
                                How does Scala compiler handle unused variable values?
                            
                                Can I run a Time Series Database (TSDB) over Apache Spark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark Mesos Cluster Mode using Dispatcher

Tags:

apache-spark

mesos

auxdx

People also ask

1 Answers

Michael Hausenblas

Recent Activity

Donate For Us