I have a system in which component A passes sql to component B, B then runs the sql via apache spark, and returns a result.
For debugging purposes, I'm putting in a second communication channel where A can pass sql to B and request and explaination of the plan.
The code in B looks something like this:
def handleExplain(sql: String, extended: Boolean): String = {
val dataFrame = sparkContext.sql(sql)
dataFrame.explain(extended)
}
The problem is that 'explain' doesn't return a string, it just prints the explain plan to the console. How do I get the string contents of what's printed to the console? Is there another function, or do I have to lift it from the console?
To get the schema of the Spark DataFrame, use printSchema() on Spark DataFrame object. From the above example, printSchema() prints the schema to console( stdout ) and show() displays the content of the Spark DataFrame.
Every query that we run in Spark, is listed in the SQL tab. We can click on the individual query to see the associated execution plan.
After creating the Dataframe, for retrieving all the data from the dataframe we have used the collect() action by writing df. collect(), this will return the Array of row type, in the below output shows the schema of the dataframe and the actual created Dataframe.
All query plans, including string representation, can be accessed through corresponding QueryExecution
object. For example to get full execution plan:
val ds: Dataset[_] = ???
ds.queryExecution.toString
only logical plan:
ds.queryExecution.logical.toString
optimized logical plan:
ds.queryExecution.optimizedPlan.toString
or executed / physical plan:
ds.queryExecution.executedPlan
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With