Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to execute a command on all workers within Apache Spark?

I have a situation where I want to execute a system process on each worker within Spark. I want this process to be run an each machine once. Specifically this process starts a daemon which is required to be running before the rest of my program executes. Ideally this should execute before I've read any data in.

I'm on Spark 2.0.2 and using dynamic allocation.

like image 506
Jon Avatar asked Nov 29 '16 19:11

Jon


People also ask

Can a worker node have multiple executors in Spark?

Yes, A worker node can be holding multiple executors (processes) if it has sufficient CPU, Memory and Storage.

How do I run a command in Spark shell?

Go to the Apache Spark Installation directory from the command line and type bin/spark-shell and press enter, this launches Spark shell and gives you a scala prompt to interact with Spark in scala language. If you have set the Spark in a PATH then just enter spark-shell in command line or terminal (mac users).

What are workers in Spark?

WORKERS. Workers (slaves) are running Spark instances where executors live to execute tasks. They are the compute nodes in Spark. A worker receives serialized tasks that it runs in a thread pool. It hosts a local Block Manager that serves blocks to other workers in a Spark cluster.


1 Answers

You may be able to achieve this with a combination of lazy val and Spark broadcast. It will be something like below. (Have not compiled below code, you may have to change few things)

object ProcessManager {
  lazy val start = // start your process here.
}

You can broadcast this object at the start of your application before you do any transformations.

val pm = sc.broadcast(ProcessManager)

Now, you can access this object inside your transformation like you do with any other broadcast variables and invoke the lazy val.

rdd.mapPartition(itr => {
  pm.value.start
  // Other stuff here.
}
like image 87
Jegan Avatar answered Oct 05 '22 08:10

Jegan