Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are Spark executors multi-threaded?

Tags:

apache-spark

How does a Spark Executor execute the code? Does it have multiple threads running? If yes, will it open multiple JDBC conenctions to read/write data from/to RDBMS?

like image 277
Puneet Singh Avatar asked Sep 18 '17 05:09

Puneet Singh


2 Answers

How does a Spark Executor execute the code?

The beauty of open source, the Apache Spark project including, is that you can see the code and find the answer yourself. It's not to say that this is the best and only way to find the answer, but mine might not be as clear as the code itself (the opposite can also be true :))

With that said, see the code of Executor yourself.

Does it have multiple threads running?

Yes. See this line where Executor creates a new TaskRunner that is a Java Runnable (a separate thread). That Runnable is going to be executed on the thread pool.

Quoting Java's Executors.newCachedThreadPool that Spark uses for the thread pool:

Creates a thread pool that creates new threads as needed, but will reuse previously constructed threads when they are available, and uses the provided ThreadFactory to create new threads when needed.

If yes, will it open multiple JDBC conenctions to read/write data from RDBMS?

I'm sure you know the answer already. Yes, it will open multiple connections and that why you should be using foreachPartition operation to _"apply a function f to each partition of this Dataset." (same applies to RDDs) and some kind of connection pool.

like image 135
Jacek Laskowski Avatar answered Oct 07 '22 11:10

Jacek Laskowski


Yes, if you set spark.executor.cores to more than 1, then your executor will have multiple parallel threads and yes, I guess then mutliple JDBC connects will be opened

like image 36
Raphael Roth Avatar answered Oct 07 '22 13:10

Raphael Roth