Issue: We are trying to run few commands on a particular host machine of our cluster. We chose SSH Action for the same. We have been facing this SSH issue for some time now. What might be the real issue here? Please point me towards the solution.
logs:
AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 [email protected] mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1--ssh/ ] | ErrorStream: Warning: Permanently added host,1.2.3.4 (RSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
org.apache.oozie.action.ActionExecutorException: AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 [email protected] mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1--ssh/ ] | ErrorStream: Warning: Permanently added 1.2.3.4,192.168.34.208 (RSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
at org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:589)
at org.apache.oozie.action.ssh.SshActionExecutor.start(SshActionExecutor.java:204)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:211)
at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:59)
at org.apache.oozie.command.XCommand.call(XCommand.java:277)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 [email protected] mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1--ssh/ ] | ErrorStream: Warning: Permanently added '1.2.3.4,1.2.3.4' (RSA) to the list of known hosts. Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
at org.apache.oozie.action.ssh.SshActionExecutor.executeCommand(SshActionExecutor.java:340)
at org.apache.oozie.action.ssh.SshActionExecutor.setupRemote(SshActionExecutor.java:373)
at org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:206)
at org.apache.oozie.action.ssh.SshActionExecutor$1.call(SshActionExecutor.java:204)
at org.apache.oozie.action.ssh.SshActionExecutor.execute(SshActionExecutor.java:547)
... 10 more
2013-10-09 12:48:25,982 WARN org.apache.oozie.command.wf.ActionStartXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[0000000-131008185935754-oozie-oozi-W@action1] Suspending Workflow Job id=0000000-131008185935754-oozie-oozi-W 2013-10-09 12:48:27,204 WARN org.apache.oozie.command.coord.CoordActionUpdateXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[0000000-131008185935754-oozie-oozi-W@action1] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100 2013-10-09 12:59:57,477 INFO org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] STARTED WorkflowKillXCommand for jobId=0000000-131008185935754-oozie-oozi-W 2013-10-09 12:59:57,685 WARN org.apache.oozie.command.coord.CoordActionUpdateXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E1100: Command precondition does not hold before execution, [, coord action is null], Error Code: E1100 2013-10-09 12:59:57,686 INFO org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] ENDED WorkflowKillXCommand for jobId=0000000-131008185935754-oozie-oozi-W 2013-10-09 13:41:32,654 WARN org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow instance can not be killed, 0000000-131008185935754-oozie-oozi-W, Error Code: E0725 2013-10-09 13:41:45,199 WARN org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow instance can not be killed, 0000000-131008185935754-oozie-oozi-W, Error Code: E0725 2013-10-09 13:42:04,869 WARN org.apache.oozie.command.wf.ResumeXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E1100: Command precondition does not hold before execution, [workflow's status is KILLED is not SUSPENDED], Error Code: E1100 2013-10-09 13:45:56,357 WARN org.apache.oozie.command.wf.KillXCommand: USER[user] GROUP[-] TOKEN[] APP[Test] JOB[0000000-131008185935754-oozie-oozi-W] ACTION[-] E0725: Workflow instance can not be killed, 0000000-131008185935754-oozie-oozi-W, Error Code: E0725
Approached tried:
Thanks;
Kasa.
Workflow in Oozie is a sequence of actions arranged in a control dependency DAG (Direct Acyclic Graph). The actions are in controlled dependency as the next action can only run as per the output of current action. Subsequent actions are dependent on its previous action.
A shell action can be configured to create or delete HDFS directories before starting the Shell job. Shell launcher configuration can be specified with a file, using the job-xml element, and inline, using the configuration elements. Oozie EL expressions can be used in the inline configuration.
To check the workflow job status via the Oozie web console, with a browser go to http://localhost:11000/oozie .
I just hit a similar problem. I had a case where I could run as USER:
ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 [email protected] mkdir -p oozie-oozi/0000000-131008185935754-oozie-oozi-W/action1--ssh/
by hand on the command line and it worked, but when launched via Oozie as USER it failed.
The reason, in my case, it failed is that I set up passwordless ssh between USER on the oozie server and USER on the remote machine. What one needs to do is set up passwordless ssh between oozie on the oozie server and USER on the remote machine. In other words, su to oozie on the oozie server and run the above command by hand. If it fails, it will fail in Oozie. If it works, then it should work in Oozie (assuming all else is correct, like dir permissions, etc.)
Take a look at what user your oozie server is running as:
ps -ef | grep oozie
Whatever user that is needs passwordless ssh to USER on the remote machine.
Whatever quux00 has answered is right. I am just adding few points to that. As the command ssh in the ssh-action will be executed by oozie user, then you will need to set oozie as a bash user.
To do that you need to change the /etc/passwd file on all the nodes of the cluster. Look for the below value (similar to it) in the /etc/passwd file.
oozie:x:488:487:Oozie User:/var/lib/oozie:/bin/false
and change it to
oozie:x:488:487:Oozie User:/var/lib/oozie:/bin/bash
which will actually make oozie user a bash user. And then proceed with the password-less authentication between the oozie user and any other user that you want on any of the host machine.
And then try to rerun the oozie job again. And let me know if it works. Hope it helps!!!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With