How to bind variable in Apache Spark SQL? For example:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("SELECT * FROM src WHERE col1 = ${VAL1}").collect().foreach(println)
The short answer is no, Spark SQL does not support variables currently. The SQL Server uses T-SQL, which is based on SQL standard extended with procedure programming, local variables and other features. Spark SQL is a pure SQL, partially compatible with SQL standard.
Use a bind variable in PL/SQL to access the variable from SQL*Plus. Bind variables are variables you create in SQL*Plus and then reference in PL/SQL. If you create a bind variable in SQL*Plus, you can use the variable as you would a declared variable in your PL/SQL subprogram and then access the variable from SQL*Plus.
The syntax for assigning a value to a SQL variable within a SELECT query is @ var_name := value , where var_name is the variable name and value is a value that you're retrieving. The variable may be used in subsequent queries wherever an expression is allowed, such as in a WHERE clause or in an INSERT statement.
Spark SQL (as of 1.6 release) does not support bind variables.
ps. What Ashrith is suggesting is not a bind variable.. You're constructing a string every time. Every time Spark will parse the query, create execution plan etc. Purpose of bind variables (in RDBMS systems for example) is to cut time on creating execution plan (which can be costly where there are a lot of joins etc). Spark has to have a special API to "parse" a query and then to "bind" variables. Spark does not have this functionality (as of today, Spark 1.6 release).
Update 8/2018: as of Spark 2.3 there are (still) no bind variables in Spark.
I verified it in both Spark shell 2.x shell and Thrift(beeline) as well. I could able to bind a variable in Spark SQL query with set
command.
Query without bind variable:
select count(1) from mytable;
Query with bind variable (parameterized):
1. Spark SQL shell
set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl} select count(1) from ${key_tbl};
2. Spark shell
spark.sql("set key_tbl=mytable") spark.sql("select count(1) from ${key_tbl}").collect()
Both w/w.o bind params the query returns an identical result.
Note: Don't give any quotes to the value of key as it's table name here.
Let me know if there are any questions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With