Starting spark in standalone client mode on 10 nodes cluster using Spark-2.1.0-SNAPSHOT.
9 nodes are workers, 10th is master and driver. Each 256GB of memory.
I'm having difficuilty to utilize my cluster fully.
I'm setting up memory limit for executors and driver to 200GB using following parameters to spark-shell:
spark-shell --executor-memory 200g --driver-memory 200g --conf spark.driver.maxResultSize=200g
When my application starts I can see those values set as expected both in console and in spark web UI /environment/
tab.
But when I go to /executors/
tab then I see that my nodes got only 114.3GB storage memory assigned, see screen below.
Total memory shown here is then 1.1TB while I would expect to have 2TB. I double checked that other processes were not using the memory.
Any idea what is the source of that discrepancy? Did I miss some setting? Is it a bug in /executors/
tab or spark engine?
You're fully utilizing the memory, but here you are only looking at the storage portion of the memory. By default, the storage portion is 60% of the total memory.
From Spark Docs
Memory usage in Spark largely falls under one of two categories: execution and storage. Execution memory refers to that used for computation in shuffles, joins, sorts and aggregations, while storage memory refers to that used for caching and propagating internal data across the cluster.
As of Spark 1.6, the execution memory and the storage memory is shared, so it's unlikely that you would need to tune the memory.fraction parameter.
If you're using yarn, the main page of the resource manager's "Memory Used" and "Memory Total" will signify the total memory usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With