How to run two spark jobs in parallel in standalone mode [duplicate]

Tags:

I have spark job in which I process a file and then do following steps.

1. Load the file into DataFrame
2. Push the DataFrame to elasticsearch
3. Run some aggregations on dataframe and save to cassandra

I have written a spark job for this in which I have following function calls

writeToES(df)
writeToCassandra(df)

Now these two operations run one by one. However these two can run in parallel.

How can I do this in a single spark job.

I can make two spark jobs each for writing to ES and Cassandra. But they will use multiple ports, which I want to avoid.

943

asked Apr 04 '18 08:04

hard coder

1 Answers

You cannot run these two actions through the same spark job. What you're surely looking for is running these two jobs in parallel in the same application.

As the documentation says, you can run multiple jobs in parallel in the same application if those jobs are submitted from different threads:

Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users).

In other words, this should run both actions in parallel (using completable future API here, but you can use any async execution or multithreading mechanism):

CompletableFuture.runAsync(() -> writeToES(df));
CompletableFuture.runAsync(() -> writeToCassandra(df));

You can then join on one or both of these two to wait for completion. As noted in the documentation, you need to pay attention to the configured scheduler mode. Using the FAIR scheduler allows you to run the above in parallel:

conf.set("spark.scheduler.mode", "FAIR")

129

answered Sep 28 '22 06:09

ernest_k

Related questions
                            
                                |+| is a semigroup, why it needs a monoid implicit resolution
                            
                                SBT build, run main class from subproject on Compile and run
                            
                                head :: tail pattern matching for strings
                            
                                Is there any trick to use macros in the same file they are defined?
                            
                                Extracting Raw JSON as String inside a Spray POST route
                            
                                Scala Typeclasses with generics
                            
                                In Play 2 how to check if a JsValue variable is NULL?
                            
                                Is Spark zipWithIndex safe with parallel implementation?
                            
                                What is the most concise way to increment a variable of type Short in Scala?
                            
                                spark submit java.lang.ClassNotFoundException
                            
                                Akka Http: Exceeded configured max-open-requests value of [32]
                            
                                Turn a side-effecting function returning Option into an Iterator
                            
                                Unit testing with Spark dataframes
                            
                                Implicit class vs Implicit conversion to trait
                            
                                How to use return of one gatling request into another request - Scala
                            
                                Why is this scala code not inferring type?
                            
                                What advantages does scala.util.Try have over try..catch?
                            
                                Is there a way to set the Scala version used in an Ammonite script?
                            
                                How to exclude jar in final sbt assembly plugin
                            
                                Understanding DAG in spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to run two spark jobs in parallel in standalone mode [duplicate]

Tags:

scala

elasticsearch

apache-spark

hard coder

People also ask

1 Answers

ernest_k

Recent Activity

Donate For Us