We have tried using various combinations of settings - but mpstat is showing that all or most cpu's are always being used (on a single 8 core system)
Following have been tried:
set master to:
local[2]
send in
conf.set("spark.cores.max","2")
in the spark configuration
Also using
--total-executor-cores 2
and
--executor-cores 2
In all cases
mpstat -A
shows that all of the CPU's are being used - and not just by the master.
So I am at a loss presently. We do need to limit the usage to a specified number of cpu's.
I had the same problem with memory size and I wanted to increase it but none of the above worked for me as well. Based on this user post I was able to resolve my problem and I think this should also work for number of cores:
from pyspark import SparkConf, SparkContext
# In Jupyter you have to stop the current context first
sc.stop()
# Create new config
conf = (SparkConf().set("spark.cores.max", "2"))
# Create new context
sc = SparkContext(conf=conf)
Hope this helps you. And please, if you have resolved your problem, send your solution as answer for this post so we can all benefit from it :)
Cheers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With