Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HiveQL: Using query results as variables

in Hive I'd like to dynamically extract information from a table, save it in a variable and further use it. Consider the following example, where I retrieve the maximum of column var and want to use it as a condition in the subsequent query.

set maximo=select max(var) from table;

select
  *
from
  table
where
  var=${hiveconf:maximo}

It does not work, although

set maximo=select max(var) from table;

${hiveconf:maximo}

shows me the intended result.

Doing:

select '${hiveconf:maximo}'

gives

"select max(var) from table"

though.

Best

like image 859
MaHo Avatar asked May 13 '16 09:05

MaHo


People also ask

How do you set a variable in Hive query?

Note that there are env and system variables as well, so you can reference ${env:USER} for example. hive> set; Update: I've started to use hivevar variables as well, putting them into hql snippets I can include from hive CLI using the source command (or pass as -i option from command line).

What is HiveQL explain various statements in HiveQL with example?

The Hive Query Language (HiveQL) is a query language for Hive to process and analyze structured data in a Metastore. This chapter explains how to use the SELECT statement with WHERE clause. SELECT statement is used to retrieve the data from a table. WHERE clause works similar to a condition.

Does Hive support in query?

Unfortunately, Hive doesn't support in, exists or subqueries.


2 Answers

Hive substitutes variables as is and does not execute them. Use shell wrapper script to get result into variable and pass it to your Hive script.

maximo=$(hive -e "set hive.cli.print.header=false; select max(var) from table;")
hive -hiveconf "maximo"="$maximo" -f your_hive_script.hql

And after this inside your script you can use select '${hiveconf:maximo}'

like image 80
leftjoin Avatar answered Sep 20 '22 17:09

leftjoin


@Hein du Plessis

Whilst it's not possible to do exactly what you want from Hue -- a constant source of frustration for me -- if you are restricted to Hue, and can't use a shell wrapper as suggested above, there are workarounds depending on the scenario.

When I once wanted to set a variable by selecting the max of a column in a table to use in a query, I got round it like this:

I first put the result into a table comprising two columns, with the (arbitrary word) 'MAX_KEY' in one column and the result of the max calculation in the other, like this:

drop table if exists tam_seg.tbl_stg_temp_max_id;
create table tam_seg.tbl_stg_temp_max_id as
select
    'MAX_KEY' as max_key
    , max(pvw_id) as max_id
from
    tam_seg.tbl_dim_cc_phone_vs_web;

I then added the word 'MAX_KEY' to a sub-query then joined in the above table so I could use the result in the main query:

select
    -- *** here is the joined in value from the table being used ***
    cast(mxi.max_id + qry.temp_id as string) as pvw_id
    , qry.cc_phone_vs_web
from
    (
    select
        snp.cc_phone_vs_web
        , row_number() over(order by snp.cc_phone_vs_web) as temp_id
        -- *** here is the key being added to the sub-query ***
        , 'MAX_KEY' as max_key
    from
        (
        select distinct cc_phone_vs_web from tam_seg.tbl_stg_base_snapshots
        ) as snp
    left outer join
        tam_seg.tbl_dim_cc_phone_vs_web as pvw
        on snp.cc_phone_vs_web = pvw.cc_phone_vs_web
    where
        pvw.cc_phone_vs_web is null
    ) as qry
-- *** here is the table with the select result in being joined in ***
left outer join
    tam_seg.tbl_stg_temp_max_id as mxi
    on qry.max_key = mxi.max_key
;

Not sure if this is your scenario but maybe it can be adapted. I'm 99% sure you can't just put a select statement directly into a variable in Hue though.

If I am doing something in just Hue I would probably do the temporary table and join method. But if I were using a shall wrapper anyway I would definitely do it there.

I hope this helps.

like image 39
free_hat Avatar answered Sep 20 '22 17:09

free_hat