Are Spark executors multi-threaded?

Question

How does a Spark Executor execute the code? Does it have multiple threads running? If yes, will it open multiple JDBC conenctions to read/write data from/to RDBMS?

Jacek Laskowski · Accepted Answer

How does a Spark Executor execute the code?

The beauty of open source, the Apache Spark project including, is that you can see the code and find the answer yourself. It's not to say that this is the best and only way to find the answer, but mine might not be as clear as the code itself (the opposite can also be true :))

With that said, see the code of Executor yourself.

Does it have multiple threads running?

Yes. See this line where Executor creates a new TaskRunner that is a Java Runnable (a separate thread). That Runnable is going to be executed on the thread pool.

Quoting Java's Executors.newCachedThreadPool that Spark uses for the thread pool:

Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available, and uses the provided ThreadFactory to create new threads when needed.

If yes, will it open multiple JDBC conenctions to read/write data from RDBMS?

I'm sure you know the answer already. Yes, it will open multiple connections and that why you should be using foreachPartition operation to _"apply a function f to each partition of this Dataset." (same applies to RDDs) and some kind of connection pool.

Raphael Roth · Answer

Yes, if you set spark.executor.cores to more than 1, then your executor will have multiple parallel threads and yes, I guess then mutliple JDBC connects will be opened

Are Spark executors multi-threaded?

Tags:

apache-spark

Puneet Singh

2 Answers

Jacek Laskowski

Raphael Roth

Recent Activity

Donate For Us

Are Spark executors multi-threaded?

Tags:

apache-spark

Puneet Singh

2 Answers

Jacek Laskowski

Raphael Roth

Related questions

Recent Activity

Donate For Us