In my sparkconf, i can set the number of cores to use, i have 4 physical, 8 logical on my laptop, what does spark do if I specify a number that was not possible on the machine, like say 100 cores?
Number of cores doesn't describe physical cores but a number of running threads. It means that nothing really strange happens if the number is higher than a number of available cores.
Depending on your setup it can be actually a preferred configuration with value around twice a number of available cores being a commonly recommended setting. Obviously if number is to high your application will spend more time on switching between threads than actual processing.
It heavily depends on your cluster manager. I assume that you're asking about local[n]
run mode.
If so, the driver and the one and only one executor are the same JVM with n
number of threads.
DAGScheduler
- the Spark execution planner will use n
threads to schedule as many tasks as you've told it should.
If you have more tasks, i.e. threads, than cores, your OS will have to deal with more threads than cores and schedule them appropriately.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With