Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding resource allocation for spark jobs on mesos

I'm working on a project in Spark, and recently switched from using Spark Standalone to Mesos for cluster management. I now find myself confused about how to allocate resources when submitting a job under the new system.

In standalone mode, I was using something like this (following some recommendations from this Cloudera blog post:

/opt/spark/bin/spark-submit --executor-memory 16G --executor-cores 8 
    --total-executor-cores 240 myscript.py

This is on a cluster where each machine has 16 cores and ~32 GB RAM.

What was nice about this is that I had nice control over the number of executors running and the resources allocated to each. In the example above, I knew I was getting 240/8=30 executors, each with 16GB of memory and 8 cores. Given the memory on each machine in the cluster, this would amount to no more than two executors running on each machine. If I wanted more executors, I could do something like

/opt/spark/bin/spark-submit --executor-memory 10G --executor-cores 5 
    --total-executor-cores 240 myscript.py

This would now give me 240/5=47 executors, each with 5 cores and 10GB memory, and would allow up to 3 executors per machine.

But now that I'm on mesos, I'm getting a bit confused. First off, I'm running in coarse-grained mode to ensure I can fix and control my resource allocation (this is in the service of fairly complex model where we want to pre-allocate resources).

Now, I can specify --total-executor-cores and --executor-memory, but the documentation tells me that --exeuctor-cores applies to Spark standalone and YARN only, which makes specifying the total number of executors and resources allocated to each difficult. Say I run this:

/opt/spark/bin/spark-submit --total-executor-cores 240 --executor-memory 16G --conf spark.mesos.coarse=true myscript.py

When I examine this job in the Mesos web UI, things start getting confusing. So, here are my questions:

  1. Terminology. The Web UI lists "frameworks", which I assume correspond to "jobs" in the standalone UI. But when I click on the detail for a given framework, it lists "tasks". But these can't be actual Spark tasks, right? As far as I can tell, "task" here must actually mean "executor" as far as Spark is concerned. This would be consistent with the UI saying my framework (job) has: 15 active tasks, 240 CPUs, and 264GB memory.

    264/15=17.6, which seems consistent with the 16GB memory per executor I specified (plus some overhead, I guess). Am I right on how I'm interpreting all this?

  2. Assuming yes, when I examine any of these "tasks" (executors) I see that each has 16 cores assigned. Given we have 16 cores per machine, this would seem to indicate I'm basically running one executor on each of 16 machines, and that each executor is getting the full 16 cores, but only 16 GB of RAM. (note that, even if I drop --executor-memory way down, to something like 4GB, mesos still just runs one executor per node, with 16 cores and 4GB RAM). But what I want to accomplish is something like my first two examples. That is, I want to run multiple executors per node, each sharing the RAM and cores of that node (i.e.a moderate number of cores pre executor, 5-8). Considering I can't specify --executor-cores in Mesos, how do I accomplish this? Or am I way off base for some reason in even wanting to accomplish this? Will Mesos just not permit multiple exeuctors per node?

like image 726
moustachio Avatar asked Dec 11 '15 15:12

moustachio


1 Answers

Question 1: In coarse-grained mode, Spark's executor (org.apache.spark.executor.CoarseGrainedExecutorBackend) is launched as Mesos task. Mesos Framework actually is Spark Driver. One Spark Driver could submit multiple Spark jobs. It depends on your Spark application. Spark and Mesos both come from AMPLab of UC Berkeley and are developed in parallel, so they use similar terminologies (executor, task ...) which may confuse you :-).

Question 2: In coarse-grained mode, Spark launch only one executor per host (please refer to https://issues.apache.org/jira/browse/SPARK-5095 for details). So for you case, Spark will launch one executor per host (each executor consume 16G memory, and all the available cores in the host which is 16 cores if there is no other workload) until total cores of executors reach 240 cores. There will be 240/16=15 executors.

Regard to spark.mesos.mesosExecutor.cores, it only works for fine-grained mode. In fine-grained mode, Spark will launch one executor (org.apache.spark.executor.MesosExecutorBackend) per host. The executor consumes the number of cores of spark.mesos.mesosExecutor.cores even though there is no task. Each task will consume another number of cores of spark.task.cpus.

like image 158
Yong Feng Avatar answered Nov 15 '22 07:11

Yong Feng