Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to restrict processing to specified number of cores in spark standalone

We have tried using various combinations of settings - but mpstat is showing that all or most cpu's are always being used (on a single 8 core system)

Following have been tried:

set master to:

local[2]

send in

conf.set("spark.cores.max","2")

in the spark configuration

Also using

--total-executor-cores 2

and

--executor-cores 2

In all cases

mpstat -A

shows that all of the CPU's are being used - and not just by the master.

So I am at a loss presently. We do need to limit the usage to a specified number of cpu's.

like image 596
WestCoastProjects Avatar asked Apr 30 '15 14:04

WestCoastProjects


1 Answers

I had the same problem with memory size and I wanted to increase it but none of the above worked for me as well. Based on this user post I was able to resolve my problem and I think this should also work for number of cores:

from pyspark import SparkConf, SparkContext

# In Jupyter you have to stop the current context first
sc.stop()

# Create new config
conf = (SparkConf().set("spark.cores.max", "2"))

# Create new context
sc = SparkContext(conf=conf)

Hope this helps you. And please, if you have resolved your problem, send your solution as answer for this post so we can all benefit from it :)

Cheers

like image 83
ahajib Avatar answered Sep 20 '22 02:09

ahajib