I am trying to run wordcount example on Hadoop 2.2.0 cluster. Many maps are failing due to this exception:
2014-01-07 05:07:12,544 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From slave2-machine/127.0.1.1 to slave2-machine:49222 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
at com.sun.proxy.$Proxy6.getTask(Unknown Source)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:133)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
at org.apache.hadoop.ipc.Client.call(Client.java:1318)
... 4 more
Every time I run the job, problematic port will be changed but map task still fails. I don't know which process supposed to listen to that port. I also tried tracking netstat -ntlp
output during run and no process never listened to the port.
UPDATE: content of /etc/hosts
for master node is this:
127.0.0.1 localhost
127.0.1.1 master-machine
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.1.101 slave1 slave1-machine
192.168.1.102 slave2 slave2-machine
192.168.1.1 master
and for slave1 is:
127.0.0.1 localhost
127.0.1.1 slave1-machine
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.1.1 master
192.168.1.101 slave1
192.168.1.102 slave2 slave2-machine
for slave2 it's like slave1 with minor changes I think you could guess. At last, content of yarn/hadoop/etc/hadoop/slaves
on master is:
slave1
slave2
1.check whether hadoop nodes can ssh each other or not. 2.check address and ports of hadoop daemon in all of config files be like each other. 3.check /etc/hosts of all nodes. this is a useful link for check whether you have launched the cluster correctly: cluster setup
I got it! your /etc/hosts are not correct. you should remove 127.0.1.1 line. I mean they should be like this:
127.0.0.1 localhost
192.168.1.101 master
192.168.1.103 slave1
192.168.1.104 slave2
192.168.1.105 slave3
and copy-paste for all slaves like this. and additionally slaves should be able to ssh each other, too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With