I'm trying to merge 2 datasets, say A and B. The dataset A has a variable "Flag" which takes 2 values. Rather than jut merging both data together I was trying to merge 2 datasets based on "flag" variable.
The merging code is the following:
create table new_data as
select a.*,b.y
from A as a left join B as b
on a.x=b.x
Since I'm running Hive code through CLI, I'm calling this through the following command
hive -f new_data.hql
The looping part of the code I'm calling to merge data based on "Flag" variable is the following:
for flag in 1 2;
do
hive -hivevar flag=$flag -f new_data.hql
done
I put the above code in another ".hql" file asn calling it:
hive -f loop_data.hql
But it's throwing error.
cannot recognize input near 'for' 'flag' 'in'
Can anybody please tell me where I'm making mistake.
Thanks!
File Name: loop_data.sh
for flag in 1 2;
do
hive -hivevar flag=$flag -f new_data.hql
done
And execute the script like:
sh loop_data.sh
DDL: create_new_data.hql
create table new_data as
select
a.*,
b.y
from
A as a left join
B as b on
a.x = b.x
where
1 = 0;
DML: insert_new_data.hql
insert into new_data
select
a.*,
b.y
from
A as a left join
B as b on
a.x = b.x
where
flag = ${hiveconf:flag}
And update you shell script like:
File Name: loop_new_data.sh
# Create table
hive -f create_new_data.hql
# Insert data
for flag in 1 2;
do
hive -hiveconf flag=$flag -f insert_new_data.hql
done
And execute it like:
sh loop_new_data.sh
Let me know if you want more info.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With