Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark is not using all configured memory

Starting spark in standalone client mode on 10 nodes cluster using Spark-2.1.0-SNAPSHOT.
9 nodes are workers, 10th is master and driver. Each 256GB of memory. I'm having difficuilty to utilize my cluster fully.

I'm setting up memory limit for executors and driver to 200GB using following parameters to spark-shell:

spark-shell --executor-memory 200g --driver-memory 200g --conf spark.driver.maxResultSize=200g

When my application starts I can see those values set as expected both in console and in spark web UI /environment/ tab.
But when I go to /executors/ tab then I see that my nodes got only 114.3GB storage memory assigned, see screen below.

enter image description here

Total memory shown here is then 1.1TB while I would expect to have 2TB. I double checked that other processes were not using the memory.
Any idea what is the source of that discrepancy? Did I miss some setting? Is it a bug in /executors/ tab or spark engine?

like image 318
jangorecki Avatar asked Aug 24 '16 16:08

jangorecki


Video Answer


1 Answers

You're fully utilizing the memory, but here you are only looking at the storage portion of the memory. By default, the storage portion is 60% of the total memory.

From Spark Docs

Memory usage in Spark largely falls under one of two categories: execution and storage. Execution memory refers to that used for computation in shuffles, joins, sorts and aggregations, while storage memory refers to that used for caching and propagating internal data across the cluster.

As of Spark 1.6, the execution memory and the storage memory is shared, so it's unlikely that you would need to tune the memory.fraction parameter.

If you're using yarn, the main page of the resource manager's "Memory Used" and "Memory Total" will signify the total memory usage.

enter image description here

like image 67
David Avatar answered Oct 26 '22 23:10

David