Hive query stuck at 99%

Question

I am inserting records using left joining in Hive.When I set limit 1 query works but for all records query get stuck at 99% reduce job.

Below query works

   Insert overwrite table tablename select a.id , b.name from a left join b on a.id = b.id limit 1;

But this does not

    Insert overwrite table tablename select table1.id , table2.name from table1 left join table2 on table1.id = table2.id;

I have increased number of reducers but still it doesn't work.

BushMinusZero · Accepted Answer

Here are a few Hive optimizations that might help the query optimizer and reduce overhead of data sent across the wire.

set hive.exec.parallel=true;
set mapred.compress.map.output=true;
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set hive.exec.parallel=true;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;

However, I think there's a greater chance that the underlying problem is key in the join. For a full description of skew and possible work arounds see this https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization

You also mentioned that table1 is much smaller than table2. You might try a map-side join depending on your hardware constraints. (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)

Syntax · Answer

If your query is getting stuck at 99% check out following options -

Data skewness, if you have skewed data it might possible 1 reducer is doing all the work
Duplicates keys on both side - If you have many duplicate join keys on both side your output might explode and query might get stuck
One of your table is small try to use map join or if possible SMB join which is a huge performance gain over reduce side join
Go to resource manager log and see amount of data job is accessing and writing.

Amar · Answer

Hive automatically does some optimizations when it comes to joins and loads one side of the join to memory if it fits the requirements. However in some cases these jobs get stuck at 99% and never really finish.

I have faced this multiple times and the way I have avoided this by explicitly specifying some settings to hive. Try with the settings below and see if it works for you.

hive.auto.convert.join=false
mapred.compress.map.output=true
hive.exec.parallel=true

RobertF · Answer

Make sure you don't have rows with duplicate id values in one of your data tables!

I recently encountered the same issue with a left join's map-reduce process getting stuck on 99% in Hue.

After a little snooping I discovered the root of my problem: there were rows with duplicate member_id matching variables in one of my tables. Left joining all of the duplicate member_ids would have created a new table containing hundreds of millions of rows, consuming more than my allotted memory on our company's Hadoop server.

Hive query stuck at 99%

Tags:

sql

hadoop

hive

mapreduce

hiveql

user2895589

4 Answers

BushMinusZero

Syntax

Amar

RobertF

Recent Activity

Donate For Us

Hive query stuck at 99%

Tags:

sql

hadoop

hive

mapreduce

hiveql

user2895589

4 Answers

BushMinusZero

Syntax

Amar

RobertF

Related questions

Recent Activity

Donate For Us