Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dynamically bind variable/parameter in Spark SQL?

How to bind variable in Apache Spark SQL? For example:

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("SELECT * FROM src WHERE col1 = ${VAL1}").collect().foreach(println)
like image 744
user3769729 Avatar asked Nov 05 '14 10:11

user3769729


People also ask

How do I add a variable in spark SQL?

The short answer is no, Spark SQL does not support variables currently. The SQL Server uses T-SQL, which is based on SQL standard extended with procedure programming, local variables and other features. Spark SQL is a pure SQL, partially compatible with SQL standard.

How do you bind variables in SQL query?

Use a bind variable in PL/SQL to access the variable from SQL*Plus. Bind variables are variables you create in SQL*Plus and then reference in PL/SQL. If you create a bind variable in SQL*Plus, you can use the variable as you would a declared variable in your PL/SQL subprogram and then access the variable from SQL*Plus.

Can we pass variable in SQL query?

The syntax for assigning a value to a SQL variable within a SELECT query is @ var_name := value , where var_name is the variable name and value is a value that you're retrieving. The variable may be used in subsequent queries wherever an expression is allowed, such as in a WHERE clause or in an INSERT statement.


2 Answers

Spark SQL (as of 1.6 release) does not support bind variables.

ps. What Ashrith is suggesting is not a bind variable.. You're constructing a string every time. Every time Spark will parse the query, create execution plan etc. Purpose of bind variables (in RDBMS systems for example) is to cut time on creating execution plan (which can be costly where there are a lot of joins etc). Spark has to have a special API to "parse" a query and then to "bind" variables. Spark does not have this functionality (as of today, Spark 1.6 release).

Update 8/2018: as of Spark 2.3 there are (still) no bind variables in Spark.

like image 55
Tagar Avatar answered Oct 05 '22 16:10

Tagar


I verified it in both Spark shell 2.x shell and Thrift(beeline) as well. I could able to bind a variable in Spark SQL query with set command.

Query without bind variable:

select count(1) from mytable; 

Query with bind variable (parameterized):

1. Spark SQL shell

 set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl}
 select count(1) from ${key_tbl};

2. Spark shell

spark.sql("set key_tbl=mytable")
spark.sql("select count(1) from ${key_tbl}").collect()

Both w/w.o bind params the query returns an identical result.

Note: Don't give any quotes to the value of key as it's table name here.

Let me know if there are any questions.

like image 41
5 revs Avatar answered Oct 05 '22 17:10

5 revs