Let's say I have a large table partitioned by dt
field. I want to query this table for data after specific date. E.g.
select * from mytab where dt >= 20140701;
The tricky part is that date is not a constant, but comes from a subquery. So basically I want something like this:
select * from mytab where dt >= (select min(dt) from activedates);
Hive can't do it, however, giving me ParseException
on subquery (from docs I'm guessing it's just not supported yet).
So how do I restrict my query based on dynamic subquery?
Note, that performance is key point here. So the faster, the better, even if it looks uglier.
Also note, that we haven't switched to Hive 0.13 yet, so solutions without in
query are preferred.
Subqueries can be used with SELECT, UPDATE, INSERT, DELETE statements along with expression operator. It could be equality operator or comparison operator such as =, >, =, <= and Like operator. A subquery is a query within another query.
A subquery can be nested inside the WHERE or HAVING clause of an outer SELECT , INSERT , UPDATE , or DELETE statement, or inside another subquery.
Multiple-row subqueries are used most commonly in WHERE and HAVING clauses. Since it returns multiple rows, it must be handled by set comparison operators (IN, ALL, ANY).
Hive decides on the partition pruning when building the execution plan and thus has to have the value of the max(dt)
prior to execution.
Currently the only way to accomplish something like this is breaking the query into two parts, when the first will be select min(dt) from activedates
, its results will be put into a variable.
2nd query will be : select * from mytab where dt >=${hiveconf:var}
.
Now this is a bit tricky.
You could either execute the 1st query into OS variable like so :
a=`hive -S -e "select min(dt) from activedates"`
And then run the 2nnd query like so :
hive -hiveconf var=$a -e "select * from mytab where dt >=${hiveconf:var}"
or event just :
hive -e "select * from mytab where dt >=$a"
Or, if you are using some other scripting language you can replace the variable in the code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With