How does a Spark Executor execute the code? Does it have multiple threads running? If yes, will it open multiple JDBC conenctions to read/write data from/to RDBMS?
How does a Spark Executor execute the code?
The beauty of open source, the Apache Spark project including, is that you can see the code and find the answer yourself. It's not to say that this is the best and only way to find the answer, but mine might not be as clear as the code itself (the opposite can also be true :))
With that said, see the code of Executor yourself.
Does it have multiple threads running?
Yes. See this line where Executor
creates a new TaskRunner
that is a Java Runnable
(a separate thread). That Runnable
is going to be executed on the thread pool.
Quoting Java's Executors.newCachedThreadPool that Spark uses for the thread pool:
Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available, and uses the provided ThreadFactory to create new threads when needed.
If yes, will it open multiple JDBC conenctions to read/write data from RDBMS?
I'm sure you know the answer already. Yes, it will open multiple connections and that why you should be using foreachPartition
operation to _"apply a function f
to each partition of this Dataset." (same applies to RDDs) and some kind of connection pool.
Yes, if you set spark.executor.cores
to more than 1, then your executor will have multiple parallel threads and yes, I guess then mutliple JDBC connects will be opened
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With