Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to validate the syntax of raw spark sql query?

Is there a way to validate the syntax of raw spark SQL query?

for example, I would like to know if there is any isValid API call spark provides?

val query = "select * from table"
if(isValid(query)) {
    sparkSession.sql(query) 
} else {
    log.error("Invalid Syntax")
}

I tried the following

val query = "select * morf table" // Invalid query
val parser = spark.sessionState.sqlParser
try{
    parser.parseExpression(query)
} catch (ParseException ex) {
    throw new Exception(ex); //Exception not getting thrown
}
Dataset<>Row df = sparkSession.sql(query) // Exception gets thrown here
df.writeStream.format("console").start()

Question: parser.parseExpression is not catching the invalid syntax before I hit the sparkSession.sql. Other words it is not being helpful in the above code. any reason? My whole goal is to catch syntax errors before I pass it on to sparkSession.sql

like image 444
user1870400 Avatar asked Feb 19 '18 11:02

user1870400


3 Answers

Use Spark SQL Parser:

val parser = spark.sessionState.sqlParser
parser.parseExpression("select * from table")
like image 119
Yehor Krivokon Avatar answered Sep 24 '22 06:09

Yehor Krivokon


After many research without an answer, I found a way to do it by reading some spark sql code. These lines of code do the job:

val query = "select * from table"
val logicalPlan = spark.sessionState.sqlParser.parsePlan(query) //parse the query and build the AST
val queryExecution = spark.sessionState.executePlan(logicalPlan) // create plans of phases (Analysis, logical optimization plan, physical plan, ...) without any execution
queryExecution.assertAnalyzed() //trigger the Analysis phase

Note that it is in the analysis phase where we have the table resolution, column names resolution etc.. So this basically does a syntax and context analysis

like image 23
Ayoub Omari Avatar answered Sep 22 '22 06:09

Ayoub Omari


Though it's late but may be helpful for someone else. Using

spark.sessionState.sqlParser

doesn't seems to be much useful as it doesn't complain even for query string as 'Hello World'

Better is to use,

sparkSession.sql(query)

which actually validates Query syntax. Since spark primarily works lazily, ideally below one should not do any operation on actual dataset.

  def validateSql(query: String, session: SparkSession): Boolean = {
    try {
      val result1 = session.sql(query)
      println(s"No Syntax Error [$result1] Query [$query]")
      return true;
    } catch {
      case e: Throwable => {
        println(s"Query has syntax error. Error [${e.getMessage}] Query [$query]")
        return false
      }
    }
  }

If you look Spark logs for above code then it would be something similar to

19/08/07 11:17:09 INFO SparkSqlParser: Parsing command: select * from XYZ limit 5
19/08/07 11:17:10 INFO CatalystSqlParser: Parsing command: bigint
19/08/07 11:17:10 INFO CatalystSqlParser: Parsing command: bigint
19/08/07 11:17:10 INFO CatalystSqlParser: Parsing command: bigint
19/08/07 11:17:10 INFO CatalystSqlParser: Parsing command: bigint
No Syntax Error [[A: bigint, BC: bigint ... 2 more fields]] Query [select * from XYZ limit 5]

Details of same can be found in another question of SO. How to validate SQL expression (in user frontend) without executing it?

However since sparkSession.sql(query) needs table metadata to validate table columns, it's best to point directly to hive meta store. Following are reference to do same programatically.

How to connect to a Hive metastore programmatically in SparkSQL?

How to create pom.xml for maven using SparkSql and Hive?

like image 37
tarunkumar Avatar answered Sep 22 '22 06:09

tarunkumar