Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping using Hiveql

Tags:

hive

hiveql

I'm trying to merge 2 datasets, say A and B. The dataset A has a variable "Flag" which takes 2 values. Rather than jut merging both data together I was trying to merge 2 datasets based on "flag" variable.

The merging code is the following:

create table new_data as
select a.*,b.y
from A as a left join B as b
on a.x=b.x

Since I'm running Hive code through CLI, I'm calling this through the following command

hive -f new_data.hql

The looping part of the code I'm calling to merge data based on "Flag" variable is the following:

for flag in 1 2;
do
  hive -hivevar flag=$flag -f new_data.hql
done

I put the above code in another ".hql" file asn calling it:

hive -f loop_data.hql

But it's throwing error.

cannot recognize input near 'for' 'flag' 'in'

Can anybody please tell me where I'm making mistake.

Thanks!

like image 820
Beta Avatar asked Feb 25 '16 11:02

Beta


1 Answers

  1. You should be adding the loop logic to shell script.

File Name: loop_data.sh

for flag in 1 2;
do
  hive -hivevar flag=$flag -f new_data.hql
done

And execute the script like:

sh loop_data.sh
  1. In your new_data.hql script, you are creating table. Since you should split out the DDL & DML in 2 separate scripts. Like

DDL: create_new_data.hql

create table new_data as
select 
  a.*,
  b.y
from 
  A as a left join 
  B as b on 
  a.x = b.x
where 
  1 = 0;

DML: insert_new_data.hql

insert into new_data 
select 
  a.*,
  b.y
from 
  A as a left join 
  B as b on 
  a.x = b.x
where
  flag = ${hiveconf:flag}

And update you shell script like:

File Name: loop_new_data.sh

# Create table
hive -f create_new_data.hql

# Insert data
for flag in 1 2;
do
  hive -hiveconf flag=$flag -f insert_new_data.hql
done

And execute it like:

sh loop_new_data.sh

Let me know if you want more info.

like image 197
Ambrish Avatar answered Sep 22 '22 00:09

Ambrish