Is there a way to validate the syntax of raw spark SQL query?
for example, I would like to know if there is any isValid
API call spark provides?
val query = "select * from table"
if(isValid(query)) {
sparkSession.sql(query)
} else {
log.error("Invalid Syntax")
}
I tried the following
val query = "select * morf table" // Invalid query
val parser = spark.sessionState.sqlParser
try{
parser.parseExpression(query)
} catch (ParseException ex) {
throw new Exception(ex); //Exception not getting thrown
}
Dataset<>Row df = sparkSession.sql(query) // Exception gets thrown here
df.writeStream.format("console").start()
Question: parser.parseExpression
is not catching the invalid syntax before I hit the sparkSession.sql
. Other words it is not being helpful in the above code. any reason? My whole goal is to catch syntax errors before I pass it on to sparkSession.sql
Use Spark SQL Parser:
val parser = spark.sessionState.sqlParser
parser.parseExpression("select * from table")
After many research without an answer, I found a way to do it by reading some spark sql code. These lines of code do the job:
val query = "select * from table"
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query) //parse the query and build the AST
val queryExecution = spark.sessionState.executePlan(logicalPlan) // create plans of phases (Analysis, logical optimization plan, physical plan, ...) without any execution
queryExecution.assertAnalyzed() //trigger the Analysis phase
Note that it is in the analysis phase where we have the table resolution, column names resolution etc.. So this basically does a syntax and context analysis
Though it's late but may be helpful for someone else. Using
spark.sessionState.sqlParser
doesn't seems to be much useful as it doesn't complain even for query string as 'Hello World'
Better is to use,
sparkSession.sql(query)
which actually validates Query syntax. Since spark primarily works lazily, ideally below one should not do any operation on actual dataset.
def validateSql(query: String, session: SparkSession): Boolean = {
try {
val result1 = session.sql(query)
println(s"No Syntax Error [$result1] Query [$query]")
return true;
} catch {
case e: Throwable => {
println(s"Query has syntax error. Error [${e.getMessage}] Query [$query]")
return false
}
}
}
If you look Spark logs for above code then it would be something similar to
19/08/07 11:17:09 INFO SparkSqlParser: Parsing command: select * from XYZ limit 5
19/08/07 11:17:10 INFO CatalystSqlParser: Parsing command: bigint
19/08/07 11:17:10 INFO CatalystSqlParser: Parsing command: bigint
19/08/07 11:17:10 INFO CatalystSqlParser: Parsing command: bigint
19/08/07 11:17:10 INFO CatalystSqlParser: Parsing command: bigint
No Syntax Error [[A: bigint, BC: bigint ... 2 more fields]] Query [select * from XYZ limit 5]
Details of same can be found in another question of SO. How to validate SQL expression (in user frontend) without executing it?
However since sparkSession.sql(query) needs table metadata to validate table columns, it's best to point directly to hive meta store. Following are reference to do same programatically.
How to connect to a Hive metastore programmatically in SparkSQL?
How to create pom.xml for maven using SparkSql and Hive?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With