Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop maps are failing due to ConnectException

I am trying to run wordcount example on Hadoop 2.2.0 cluster. Many maps are failing due to this exception:

2014-01-07 05:07:12,544 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.net.ConnectException: Call From slave2-machine/127.0.1.1 to slave2-machine:49222 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
    at com.sun.proxy.$Proxy6.getTask(Unknown Source)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:133)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
    at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
    at org.apache.hadoop.ipc.Client.call(Client.java:1318)
    ... 4 more

Every time I run the job, problematic port will be changed but map task still fails. I don't know which process supposed to listen to that port. I also tried tracking netstat -ntlp output during run and no process never listened to the port.

UPDATE: content of /etc/hosts for master node is this:

127.0.0.1   localhost
127.0.1.1   master-machine

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.1.101 slave1 slave1-machine
192.168.1.102 slave2 slave2-machine
192.168.1.1 master

and for slave1 is:

127.0.0.1   localhost
127.0.1.1   slave1-machine

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
192.168.1.1 master
192.168.1.101 slave1
192.168.1.102 slave2 slave2-machine

for slave2 it's like slave1 with minor changes I think you could guess. At last, content of yarn/hadoop/etc/hadoop/slaves on master is:

slave1
slave2
like image 287
Mehraban Avatar asked Jan 07 '14 13:01

Mehraban


1 Answers

1.check whether hadoop nodes can ssh each other or not. 2.check address and ports of hadoop daemon in all of config files be like each other. 3.check /etc/hosts of all nodes. this is a useful link for check whether you have launched the cluster correctly: cluster setup

I got it! your /etc/hosts are not correct. you should remove 127.0.1.1 line. I mean they should be like this:

127.0.0.1       localhost
192.168.1.101    master
192.168.1.103    slave1
192.168.1.104    slave2
192.168.1.105    slave3

and copy-paste for all slaves like this. and additionally slaves should be able to ssh each other, too.

like image 137
masoumeh Avatar answered Nov 15 '22 00:11

masoumeh