Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the use of queryExecution in spark dataframe?

I have got to know about variable called queryExecution on a dataframe object and found below output in console . But it is not sure how it can be helpful .Please find the output in the console.

scala> df.queryExecution
res5: org.apache.spark.sql.SQLContext#QueryExecution =
== Parsed Logical Plan ==
Project [_1#0 AS ID#2,_2#1 AS Token4#3]
 LocalRelation [_1#0,_2#1], [[1,a],[2,b]]

== Analyzed Logical Plan ==
ID: int, Token4: string
Project [_1#0 AS ID#2,_2#1 AS Token4#3]
 LocalRelation [_1#0,_2#1], [[1,a],[2,b]]

== Optimized Logical Plan ==
LocalRelation [ID#2,Token4#3], [[1,a],[2,b]]

== Physical Plan ==
LocalTableScan [ID#2,Token4#3], [[1,a],[2,b]]

Code Generation: true

Thanks

like image 471
A srinivas Avatar asked Jan 18 '17 09:01

A srinivas


People also ask

What is use of Aqe in Spark?

Spark Adaptive Query Execution (AQE) is a query re-optimization that occurs during query execution. In terms of technical architecture, the AQE is a framework of dynamic planning and replanning of queries based on runtime statistics, which supports a variety of optimizations such as, Dynamically Switch Join Strategies.

What is the use of createOrReplaceTempView?

createOrReplaceTempView. Creates or replaces a local temporary view with this DataFrame . The lifetime of this temporary table is tied to the SparkSession that was used to create this DataFrame .

What is adaptive query execution in Spark?

Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that makes use of the runtime statistics to choose the most efficient query execution plan, which is enabled by default since Apache Spark 3.2.

What is the use of SQLContext?

SQLContext is the entry point to SparkSQL which is a Spark module for structured data processing. Once SQLContext is initialised, the user can then use it in order to perform various “sql-like” operations over Datasets and Dataframes.


1 Answers

To implement Spark SQL, Spark implements an extensible optimizer, called Catalyst, based on functional programming constructs in Scala.

At its core, Catalyst contains a general library for representing trees and applying rules to manipulate them.

On top of this framework are built specific libraries to relational query processing (e.g., expressions, logical query plans), and several sets of rules that handle different phases of query execution: analysis, logical optimization, physical planning, and code generation to compile parts of queries to Java bytecode.

Thus the queryExecution is an integral part of a Dataset/DataFrame which represents the query execution that will create and transform your data.

We mainly use it to debug and optimize transformation.

You can read more about the introduction to the Catalyst in the following blog post Deep Dive into Spark SQL’s Catalyst Optimizer and also in Mastering Apache Spark by @JacekLaskowski :

  • Query Execution. [WIP]

  • Debuggig query execution. [WIP]

like image 50
eliasah Avatar answered Sep 20 '22 02:09

eliasah