Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Parallelism in Standalone Mode

I'm trying to run spark in standalone mode in my system. The current specification of my system is 8 cores and 32 Gb memory. Base on this article I calculate the spark configurations as the following:

spark.driver.memory 2g
spark.executor.cores 3
spark.executor.instances 2
spark.executor.memory 20g
maximizeResourceAllocation TRUE

I created spark context in my jupyter notebook like this and was checking the parallelism level, by this

sc = SparkContext()
sc.defaultParallelism

The default parallelism is giving me 8. My question is why it's giving me 8 even though I mentioned 2 cores? If it's not giving me the actual parallelism of my system, then how to get the actual level of parallelism?

Thank you!

like image 916
Beta Avatar asked Oct 17 '22 08:10

Beta


1 Answers

sc.defaultParallelism

returns default level of parallelism defined on SparkContext.By default it is number of cores available to application.

but to know what are the setting pre-applied for jupyter note book, you can print

 sc._conf.getAll()

from scala sc.getConf.getAll.foreach(println)

That should have the property

spark.default.parallelism

I think in this case its preset thats why you are getting 8 in your case.

like image 66
Ram Ghadiyaram Avatar answered Nov 15 '22 10:11

Ram Ghadiyaram