Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Meaning of Exchange in Spark Stage

Can anyone explain me the meaning of exchange in my spark stages in spark DAG. Most of my stages either starts or end in exchange.

1). WholeStageCodeGen -> Exchange 2). Exchange -> WholeStageCodeGen -> SortAggregate -> Exchange

like image 954
Akshat Avatar asked Aug 29 '17 10:08

Akshat


1 Answers

Whole stage code generation is a technique inspired by modern compilers to collapse the entire query into a single function Prior to whole-stage code generation, each physical plan is a class with the code defining the execution. With whole-stage code generation, all the physical plan nodes in a plan tree work together to generate Java code in a single function for execution. This Java code is then turned into JVM bytecode using Janino, a fast Java compiler. Then JVM JIT kicks in to optimize the bytecode further and eventually compiles them into machine instructions.

For example

== Physical Plan ==
*Project [id#27, token#28, token#6]
+- *SortMergeJoin [id#27], [id#5], Inner
   :- *Sort [id#27 ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(id#27, 200)

Where ever you see *, it means that wholestagecodegen has generated hand written code prior to the aggregation. Exchange means the Shuffle Exchange between jobs.Exchange does not have whole-stage code generation because it is sending data across the network.

like image 128
Nayan Sharma Avatar answered Nov 10 '22 06:11

Nayan Sharma