I have got to know about variable called queryExecution on a dataframe object and found below output in console . But it is not sure how it can be helpful .Please find the output in the console.
scala> df.queryExecution
res5: org.apache.spark.sql.SQLContext#QueryExecution =
== Parsed Logical Plan ==
Project [_1#0 AS ID#2,_2#1 AS Token4#3]
LocalRelation [_1#0,_2#1], [[1,a],[2,b]]
== Analyzed Logical Plan ==
ID: int, Token4: string
Project [_1#0 AS ID#2,_2#1 AS Token4#3]
LocalRelation [_1#0,_2#1], [[1,a],[2,b]]
== Optimized Logical Plan ==
LocalRelation [ID#2,Token4#3], [[1,a],[2,b]]
== Physical Plan ==
LocalTableScan [ID#2,Token4#3], [[1,a],[2,b]]
Code Generation: true
Thanks
Spark Adaptive Query Execution (AQE) is a query re-optimization that occurs during query execution. In terms of technical architecture, the AQE is a framework of dynamic planning and replanning of queries based on runtime statistics, which supports a variety of optimizations such as, Dynamically Switch Join Strategies.
createOrReplaceTempView. Creates or replaces a local temporary view with this DataFrame . The lifetime of this temporary table is tied to the SparkSession that was used to create this DataFrame .
Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that makes use of the runtime statistics to choose the most efficient query execution plan, which is enabled by default since Apache Spark 3.2.
SQLContext is the entry point to SparkSQL which is a Spark module for structured data processing. Once SQLContext is initialised, the user can then use it in order to perform various “sql-like” operations over Datasets and Dataframes.
To implement Spark SQL, Spark implements an extensible optimizer, called Catalyst
, based on functional programming constructs in Scala.
At its core, Catalyst contains a general library for representing trees and applying rules to manipulate them.
On top of this framework are built specific libraries to relational query processing (e.g., expressions, logical query plans), and several sets of rules that handle different phases of query execution: analysis, logical optimization, physical planning, and code generation to compile parts of queries to Java bytecode.
Thus the queryExecution
is an integral part of a Dataset/DataFrame which represents the query execution that will create and transform your data.
We mainly use it to debug and optimize transformation.
You can read more about the introduction to the Catalyst in the following blog post Deep Dive into Spark SQL’s Catalyst Optimizer and also in Mastering Apache Spark by @JacekLaskowski :
Query Execution. [WIP]
Debuggig query execution. [WIP]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With