Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oozie and Job History Server configuration problems

Problem

I'm trying to install psuedo-distributed CDH without the use of CDM. Everything "works" via the console. However, the second I begin using Hue, I receive an error when trying to work with Pig.

The error shown in Hue is:

JA017: Could not lookup launched hadoop Job ID [job_local2125047777_0001] which was associated with action [0000000-160112011607704-oozie-oozi-W@pig]. Failing this action!

I believe this is an error that is originating due to a miscommunication due to an Oozie workflow issue of connecting Pig with the Job History Server.

Prior to this, I was unable to use Hive from Hue because Oozie had difficulty installing the sharelib for Oozie on HDFS. I resolved this by creating a symbolic link between /etc/hadoop/conf/core-site.xml and /etc/oozie/conf/hadoop-conf/core-site.xml. As suggested here: Apache Oozie failed loading ShareLib

Script information

The configuration script that I've written to install CDH onto Scientific Linux 7 is available here: https://github.com/coatless/stat490uiuc/blob/master/install_scripts/cdh_build.sh

Specifically, I am trying to obtain results from the pig script:

data = LOAD '/user/hue/pig/examples/data/midsummer.txt' as (text:CHARARRAY);

upper_case = FOREACH data GENERATE org.apache.pig.piggybank.evaluation.string.UPPER(text);

STORE upper_case INTO '$output' ;

Attempted Solutions

From googling, I've come across the following solutions that, once implemented, have not worked out.

  • JA017: Could not lookup launched hadoop Job ID

Suggested to run the following commands:

sudo -u hdfs hadoop fs -mkdir -p /user/history
sudo -u hdfs hadoop fs -chmod -R 1777 /user/history
sudo -u hdfs hadoop fs -chown mapred:hadoop /user/history

Rebooted the Resource & Node Manager, HDFS, and History Server to no avail.

In the thread, there was another user who suggested to set a property in job.properties that specified the user.name=mapred. However, I could not find any reference to job.properties for Hue jobs.

  • Oozie logs report Unknown hadoop job and history server UI not populated

This posts suggests declaring fixed paths for the history server within the mapred-site.xml file:

<property>
  <name>mapreduce.jobhistory.done-dir</name>
  <value>/user/history/done</value>
</property>
<property>
   <name>mapreduce.jobhistory.intermediate-done-dir</name>
   <value>/user/history/done_intermediate</value>
</property>

This also did not work.

  • JA017: Unknown hadoop job

Indicates the issue may be related to a permissions problem, however, the user does not provide specifics on the how the problem was resolved.

Any help would be appreciated.

Full oozie log

Full error text from the oozie.log file:

2016-01-11 23:51:59,195  WARN ParameterVerifier:523 - SERVER[server-name] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] The application does not define formal parameters in its XML definition
2016-01-11 23:51:59,275  WARN LiteWorkflowAppService:523 - SERVER[server-name] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] libpath [hdfs://localhost:8020/user/hue/oozie/workspaces/_cloudera_-oozie-1-1452577913.73/lib] does not exist
2016-01-11 23:51:59,572  INFO ActionStartXCommand:520 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@:start:] Start action [0000000-160111235108256-oozie-oozi-W@:start:] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2016-01-11 23:51:59,595  INFO ActionStartXCommand:520 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@:start:] [***0000000-160111235108256-oozie-oozi-W@:start:***]Action status=DONE
2016-01-11 23:51:59,596  INFO ActionStartXCommand:520 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@:start:] [***0000000-160111235108256-oozie-oozi-W@:start:***]Action updated in DB!
2016-01-11 23:52:00,052  INFO ActionStartXCommand:520 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@pig] Start action [0000000-160111235108256-oozie-oozi-W@pig] with user-retry state : userRetryCount [0], userRetryMax [0], userRetryInterval [10]
2016-01-11 23:52:03,487  WARN Credentials:96 - SERVER[server-name] Null token ignored for oozie mr token
2016-01-11 23:52:03,506  WARN Credentials:96 - SERVER[server-name] Null token ignored for oozie mr token
2016-01-11 23:52:03,562  WARN JobResourceUploader:64 - SERVER[server-name] Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2016-01-11 23:52:03,563  WARN JobResourceUploader:171 - SERVER[server-name] No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2016-01-11 23:52:04,169  WARN MRApps:582 - SERVER[server-name] cache file (mapreduce.job.cache.files) hdfs://localhost:8020/user/oozie/share/lib/lib_20160111222734/pig/json-simple-1.1.jar conflicts with cache file (mapreduce.job.cache.files) hdfs://localhost:8020/user/oozie/share/lib/lib_20160111222734/oozie/json-simple-1.1.jar This will be an error in Hadoop 2.0
2016-01-11 23:52:08,611  WARN Credentials:96 - SERVER[server-name] Null token ignored for oozie mr token
2016-01-11 23:52:08,618  WARN PigActionExecutor:523 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@pig] Exception in check(). Message[JA017: Could not lookup launched hadoop Job ID [job_local1961106749_0001] which was associated with  action [0000000-160111235108256-oozie-oozi-W@pig].  Failing this action!]
org.apache.oozie.action.ActionExecutorException: JA017: Could not lookup launched hadoop Job ID [job_local1961106749_0001] which was associated with  action [0000000-160111235108256-oozie-oozi-W@pig].  Failing this action!
       at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1274)
       at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1203)
       at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:250)
       at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
       at org.apache.oozie.command.XCommand.call(XCommand.java:286)
       at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:321)
       at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:250)
       at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)
2016-01-11 23:52:08,620  WARN ActionStartXCommand:523 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@pig] Error starting action [pig]. ErrorType [FAILED], ErrorCode [JA017], Message [JA017: Could not lookup launched hadoop Job ID [job_local1961106749_0001] which was associated with  action [0000000-160111235108256-oozie-oozi-W@pig].  Failing this action!]
org.apache.oozie.action.ActionExecutorException: JA017: Could not lookup launched hadoop Job ID [job_local1961106749_0001] which was associated with  action [0000000-160111235108256-oozie-oozi-W@pig].  Failing this action!
       at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1274)
       at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1203)
       at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:250)
       at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
       at org.apache.oozie.command.XCommand.call(XCommand.java:286)
       at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:321)
       at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:250)
       at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)
2016-01-11 23:52:08,621  WARN ActionStartXCommand:523 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@pig] Failing Job due to failed action [pig]
2016-01-11 23:52:08,623  WARN LiteWorkflowInstance:523 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@pig] Workflow Failed. Failing node [pig]
2016-01-11 23:52:08,768  INFO KillXCommand:520 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[] STARTED WorkflowKillXCommand for jobId=0000000-160111235108256-oozie-oozi-W
2016-01-11 23:52:08,806  INFO KillXCommand:520 - SERVER[server-name] USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[] ENDED WorkflowKillXCommand for jobId=0000000-160111235108256-oozie-oozi-W
2016-01-11 23:52:09,038  INFO CallbackServlet:520 - SERVER[server-name] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@pig] callback for action [0000000-160111235108256-oozie-oozi-W@pig]
2016-01-11 23:52:09,072 ERROR CompletedActionXCommand:517 - SERVER[server-name] USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@pig] XException,
org.apache.oozie.command.CommandException: E0800: Action it is not running its in [FAILED] state, action [0000000-160111235108256-oozie-oozi-W@pig]
       at org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:92)
       at org.apache.oozie.command.XCommand.call(XCommand.java:257)
       at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)
2016-01-11 23:52:09,082  WARN CallableQueueService$CallableWrapper:523 - SERVER[server-name] USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000000-160111235108256-oozie-oozi-W] ACTION[0000000-160111235108256-oozie-oozi-W@pig] exception callable [callback], E0800: Action it is not running its in [FAILED] state, action [0000000-160111235108256-oozie-oozi-W@pig]
org.apache.oozie.command.CommandException: E0800: Action it is not running its in [FAILED] state, action [0000000-160111235108256-oozie-oozi-W@pig]
       at org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:92)
       at org.apache.oozie.command.XCommand.call(XCommand.java:257)
       at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)
like image 786
coatless Avatar asked Jan 12 '16 08:01

coatless


1 Answers

You should doublecheck using the HUE File browser whether all permissions are correct on ALL directories and subdirectories of /user/history.

In my case, all users had permissions on all subfolders of /user/history, but the HUE File browser told me that the '/user/history' directory itself had the following permission set:

Name        User     Group     Permissions
history     mapred   hadoop    drwxrwx--- 

This resulted in the error when using a different user than mapred. The following command helped:

sudo -u hdfs hadoop fs -chmod 777 /user/history
like image 155
stefan.m Avatar answered Nov 05 '22 14:11

stefan.m