Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oozie jobs struck in running state

Tags:

java

hadoop

oozie

I installed oozie 4.0.1 on a hadoop 2.2 cluster. After that, I tried to run a oozie job(java action). Everything seems to be fine :

  • When I run job.properties,it gives the job id as usually.
  • When i checked oozie console job is in running state.
  • It runs the java code.

However, oozie suddenly stops and shows the following error.

  ACTION[0000001-140526105244150-oozie-labu-W@javaMainAction] Exception in check().    Message[java.net.ConnectException: Call From labuser-VirtualBox/127.0.1.1 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused]
java.io.IOException: java.net.ConnectException: Call From labuser-VirtualBox/127.0.1.1 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:331)
    at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:416)
    at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:522)
    at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:183)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:580)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:578)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:578)
    at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:596)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.getRunningJob(JavaActionExecutor.java:992)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1005)
    at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:177)
    at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:56)
    at org.apache.oozie.command.XCommand.call(XCommand.java:280)
    at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Call From labuser-VirtualBox/127.0.1.1 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy31.getJobReport(Unknown Source)
    at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)
    at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)
    ... 19 more
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
    at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
    at org.apache.hadoop.ipc.Client.call(Client.java:1318)
    ... 27 more
2014-05-26 11:02:15,305  WARN ActionCheckXCommand:542 - USER[labuser] GROUP[-] TOKEN[] APP[WorkflowJavaMainAction] JOB[0000001-140526105244150-oozie-labu-W] ACTION[0000001-140526105244150-oozie-labu-W@javaMainAction] Exception while executing check(). Error Code [  JA006], Message[  JA006: Call From labuser-VirtualBox/127.0.1.1 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused]
org.apache.oozie.action.ActionExecutorException:   JA006: Call From labuser-VirtualBox/127.0.1.1 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
    at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1095)
    at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:177)
    at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:56)
    at org.apache.oozie.command.XCommand.call(XCommand.java:280)
    at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Call From labuser-VirtualBox/127.0.1.1 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy31.getJobReport(Unknown Source)
    at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)
    at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)
    at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:416)
    at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:522)
    at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:183)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:580)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:578)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
    at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:578)
    at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:596)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.getRunningJob(JavaActionExecutor.java:992)
    at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1005)
    ... 7 more
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
    at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
    at org.apache.hadoop.ipc.Client.call(Client.java:1318)
    ... 27 more
2014-05-26 11:02:15,307  INFO ActionCheckXCommand:539 - USER[labuser] GROUP[-] TOKEN[] APP[WorkflowJavaMainAction] JOB[0000001-140526105244150-oozie-labu-W] ACTION[0000001-140526105244150-oozie-labu-W@javaMainAction] Next Retry, Attempt Number [1] in [60,000] milliseconds

What is the problem ?

like image 681
Reddevil Avatar asked Jan 11 '23 13:01

Reddevil


1 Answers

If you are working in hadoop-2.2.0 you must start job historyserver to avoid the above mentioned error.

By default historyserver was located in hadoop/sbin. while starting hadoop some times jobhistory server will not run so must start jobhistory server manually using below command.

hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
like image 153
Suresh Ram Avatar answered Jan 27 '23 09:01

Suresh Ram