Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive Job failed with return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask and Query Performance

Tags:

hadoop

hive

azure

Every day I have a hive job that compute some aggregations for each quarter of hour for two months of data. It's resulting in submitting something like 5760 jobs to Tez.

The job failed with the following error in the stderr :

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask

The error could occurred after 2300 - 2500 tez jobs. Just before this error, There is a lot of the following logs in the Yarn Logs :

2015-12-10 21:53:35,286 INFO [TezChild] task.ContainerReporter: Sleeping for 200ms before retrying getTask again. Got null now. Next getTask sleep message after 2000ms

And the execution time of a single job is dramatically expending, from 20s to 100s.

I don't have any clue about my issue and I can't find anything else in yarn, Hadoop, hive, or tez logs (no exceptions, nothing marked as an error).

So I have two questions : Where can I find more information in logs or something else that could help me to resolve this issue ?

Currently we use :

  • The latest version of Azure HDInsight 3.2
  • Jobs are submitted to the cluster with the C# SDK
  • Hive jobs use tez

Question 2 : I'm pretty sure that we do not do our aggregations in a good way. For each aggregation (i.e for each quarter), we should retrieve the precedent value of a row. I hoped to use the LAG function, but we could not pass a predicate for finding the precedent value (we need the precedent value that is not greater than the current value). So we could not find an other way than generate a query for each quarter that we need to compute. Is someone know how we can do that in a single hive query ?

Thanks in advance for any help, Best regards

like image 389
mklotz Avatar asked Dec 11 '15 11:12

mklotz


1 Answers

Cause: This issue occurs when Kerberos is enabled and "hive.server2.enable.doAs" property in Hive is set to true. When this property is set, the query looks for "Run as end user instead of Hive user is true", which means the end user has to be present locally in every Node Managers. The above error occurs when the end user is not present locally.

Solution: To resolve this issue, create the end user running the Hive queries locally or present it through AD/LDAP.

like image 139
Sajawal Nadeem Avatar answered Oct 31 '22 17:10

Sajawal Nadeem