I'm trying to run spark in standalone mode in my system. The current specification of my system is 8 cores and 32 Gb memory. Base on this article I calculate the spark configurations as the following:
spark.driver.memory 2g
spark.executor.cores 3
spark.executor.instances 2
spark.executor.memory 20g
maximizeResourceAllocation TRUE
I created spark context in my jupyter notebook like this and was checking the parallelism level, by this
sc = SparkContext()
sc.defaultParallelism
The default parallelism is giving me 8. My question is why it's giving me 8 even though I mentioned 2 cores? If it's not giving me the actual parallelism of my system, then how to get the actual level of parallelism?
Thank you!
sc.defaultParallelism
returns default level of parallelism defined on SparkContext.By default it is number of cores available to application.
but to know what are the setting pre-applied for jupyter note book, you can print
sc._conf.getAll()
from scala sc.getConf.getAll.foreach(println)
That should have the property
spark.default.parallelism
I think in this case its preset thats why you are getting 8 in your case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With