I've set up a small Hadoop cluster for testing. Setup went fairly well with the NameNode (1 machine), SecondaryNameNode (1) and all DataNodes (3). The machines are named "master", "secondary" and "data01", "data02" and "data03". All DNS are properly set up, and passwordless SSH was configured from master/secondary to all machines and back. I formatted the cluster with <code>bin/hadoop namenode -format</code>, and then started all services using <code>bin/start-all.sh</code>. All processes on all nodes were checked to be up and running with <code>jps</code>. My basic configuration files look something like this: <pre class="prettyprint"><code> <configuration> <property> <name>fs.default.name</name>  <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name>  <value>/hdfs/tmp</value> </property> </configuration>  <configuration> <property> <name>dfs.name.dir</name> <value>/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration> # conf/masters secondary # conf/slaves data01 data02 data03 </code></pre> I'm just trying to get HDFS running properly now. I've created a dir for testing <code>hadoop fs -mkdir testing</code>, then tried to copy some files into it with <code>hadoop fs -copyFromLocal /tmp/*.txt testing</code>. This is when hadoop crashes, giving me more or less this: <pre class="prettyprint"><code>WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hd/testing/wordcount1.txt could only be replicated to 0 nodes, instead of 1 at ... (such and such) WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null at ... WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hd/testing/wordcount1.txt" - Aborting... at ... ERROR hdfs.DFSClient: Exception closing file /user/hd/testing/wordcount1.txt: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hd/testing/wordcount1.txt could only be replicated to 0 nodes, instead of 1 at ... </code></pre> And so on. A similar issue occurs when I try to run <code>hadoop fs -lsr .</code> from a DataNode machine, only to get the following: <pre class="prettyprint"><code>12/01/02 10:02:11 INFO ipc.Client: Retrying connt to server master/192.162.10.10:9000. Already tried 0 time(s). 12/01/02 10:02:12 INFO ipc.Client: Retrying connt to server master/192.162.10.10:9000. Already tried 1 time(s). 12/01/02 10:02:13 INFO ipc.Client: Retrying connt to server master/192.162.10.10:9000. Already tried 2 time(s). ... </code></pre> I'm saying it's similar, because I suspect this is a port availability issue. Running <code>telnet master 9000</code> reveals that the port is closed. I've read somewhere that this might be an IPv6 clash issue, and thus defined the following in conf/hadoop-env.sh: <pre class="prettyprint"><code>export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true </code></pre> But that didn't do the trick. Running <code>netstat</code> on the master reveals something like this: <pre class="prettyprint"><code>Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 localhost:9000 localhost:56387 ESTABLISHED tcp 0 0 localhost:56386 localhost:9000 TIME_WAIT tcp 0 0 localhost:56387 localhost:9000 ESTABLISHED tcp 0 0 localhost:56384 localhost:9000 TIME_WAIT tcp 0 0 localhost:56385 localhost:9000 TIME_WAIT tcp 0 0 localhost:56383 localhost:9000 TIME_WAIT </code></pre> At this point I'm pretty sure the problem is with the port (9000), but I'm not sure what I missed as far as configuration goes. Any ideas? Thanks. <h3>update</h3> I found that hard coding DNS names into <code>/etc/hosts</code> not only help resolve this, but also speeds up the connections. The downside is that you have to do this on all the machines in the cluster, and again when you add new nodes. Or you can just set up a DNS server, which I didn't. Here's a sample of my one node in my cluster (nodes are named <code>hadoop01</code>, <code>hadoop02</code>, etc, with the master and secondary being 01 and 02). Node that most of it are generated by the OS: <pre class="prettyprint"><code># this is a sample for a machine with dns hadoop01 ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastrprefix ff02::1 ip6-allnodes ff02::2 ip6-allroutes # --- Start list of nodes 192.168.10.101 hadoop01 192.168.10.102 hadoop02 192.168.10.103 hadoop03 192.168.10.104 hadoop04 192.168.10.105 hadoop05 192.168.10.106 hadoop06 192.168.10.107 hadoop07 192.168.10.108 hadoop08 192.168.10.109 hadoop09 192.168.10.110 hadoop10 # ... and so on # --- End list of nodes # Auto-generated hostname. Please do not remove this comment. 127.0.0.1 hadoop01 localhost localhost.localdomain </code></pre> Hope this helps.

Replace localhost in hdfs://localhost:9000 with ip-address or hostname for the fs.default.name property in NameNode when there are remote nodes connecting to the NameNode. <blockquote> All processes on all nodes were checked to be up and running with <code>jps</code> </blockquote> There might be some errors in the log files. jps makes sure that the process is running.

Hadoop HDFS - Cannot connect to port on master

Tags:

networking

port

hadoop

hdfs

I've set up a small Hadoop cluster for testing. Setup went fairly well with the NameNode (1 machine), SecondaryNameNode (1) and all DataNodes (3). The machines are named "master", "secondary" and "data01", "data02" and "data03". All DNS are properly set up, and passwordless SSH was configured from master/secondary to all machines and back.

I formatted the cluster with bin/hadoop namenode -format, and then started all services using bin/start-all.sh. All processes on all nodes were checked to be up and running with jps. My basic configuration files look something like this:

<!-- conf/core-site.xml -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <!-- 
      on the master it's localhost
      on the others it's the master's DNS
      (ping works from everywhere)
    -->
    <value>hdfs://localhost:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <!-- I picked /hdfs for the root FS -->
    <value>/hdfs/tmp</value>
  </property>
</configuration>

<!-- conf/hdfs-site.xml -->
<configuration>
  <property>
    <name>dfs.name.dir</name>
    <value>/hdfs/name</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/hdfs/data</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
</configuration>

# conf/masters
secondary

# conf/slaves
data01
data02
data03

I'm just trying to get HDFS running properly now.

I've created a dir for testing hadoop fs -mkdir testing, then tried to copy some files into it with hadoop fs -copyFromLocal /tmp/*.txt testing. This is when hadoop crashes, giving me more or less this:

WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hd/testing/wordcount1.txt could only be replicated to 0 nodes, instead of 1
  at ... (such and such)

WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
  at ...

WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hd/testing/wordcount1.txt" - Aborting...
  at ...

ERROR hdfs.DFSClient: Exception closing file /user/hd/testing/wordcount1.txt: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hd/testing/wordcount1.txt could only be replicated to 0 nodes, instead of 1
  at ...

And so on. A similar issue occurs when I try to run hadoop fs -lsr . from a DataNode machine, only to get the following:

12/01/02 10:02:11 INFO ipc.Client: Retrying connt to server master/192.162.10.10:9000. Already tried 0 time(s).
12/01/02 10:02:12 INFO ipc.Client: Retrying connt to server master/192.162.10.10:9000. Already tried 1 time(s).
12/01/02 10:02:13 INFO ipc.Client: Retrying connt to server master/192.162.10.10:9000. Already tried 2 time(s).
...

I'm saying it's similar, because I suspect this is a port availability issue. Running telnet master 9000 reveals that the port is closed. I've read somewhere that this might be an IPv6 clash issue, and thus defined the following in conf/hadoop-env.sh:

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

But that didn't do the trick. Running netstat on the master reveals something like this:

Proto Recv-Q Send-Q  Local Address       Foreign Address      State
tcp        0      0  localhost:9000      localhost:56387      ESTABLISHED
tcp        0      0  localhost:56386     localhost:9000       TIME_WAIT
tcp        0      0  localhost:56387     localhost:9000       ESTABLISHED
tcp        0      0  localhost:56384     localhost:9000       TIME_WAIT
tcp        0      0  localhost:56385     localhost:9000       TIME_WAIT
tcp        0      0  localhost:56383     localhost:9000       TIME_WAIT

At this point I'm pretty sure the problem is with the port (9000), but I'm not sure what I missed as far as configuration goes. Any ideas? Thanks.

update

I found that hard coding DNS names into /etc/hosts not only help resolve this, but also speeds up the connections. The downside is that you have to do this on all the machines in the cluster, and again when you add new nodes. Or you can just set up a DNS server, which I didn't.

Here's a sample of my one node in my cluster (nodes are named hadoop01, hadoop02, etc, with the master and secondary being 01 and 02). Node that most of it are generated by the OS:

# this is a sample for a machine with dns hadoop01
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastrprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allroutes

# --- Start list of nodes
192.168.10.101 hadoop01
192.168.10.102 hadoop02
192.168.10.103 hadoop03
192.168.10.104 hadoop04
192.168.10.105 hadoop05
192.168.10.106 hadoop06
192.168.10.107 hadoop07
192.168.10.108 hadoop08
192.168.10.109 hadoop09
192.168.10.110 hadoop10
# ... and so on

# --- End list of nodes

# Auto-generated hostname. Please do not remove this comment.
127.0.0.1 hadoop01 localhost localhost.localdomain

Hope this helps.

387

asked Jan 02 '12 10:01

sa125

1 Answers

Replace localhost in hdfs://localhost:9000 with ip-address or hostname for the fs.default.name property in NameNode when there are remote nodes connecting to the NameNode.

All processes on all nodes were checked to be up and running with jps

There might be some errors in the log files. jps makes sure that the process is running.

161

answered Oct 15 '22 03:10

Praveen Sripati

Related questions
                            
                                Minimizing copies when writing large data to a socket
                            
                                SSL Handshake fails after clienthello
                            
                                specify ip address for docker for mac
                            
                                How can I "interconnect" two sockets in Linux?
                            
                                Netty - How to get server response in the client
                            
                                Network Programming with Perl, by Stein - still relevant?
                            
                                How to make a server discoverable to LAN clients
                            
                                Socket is only catching outgoing packets, not incoming ones
                            
                                UnknownHostException from Java but host resolves with Ping/nslookup/curl
                            
                                WMI EnableDHCP fails on disconnected adapter
                            
                                Identifying a property name with a low footprint
                            
                                Rails shows IP as 127.0.0.1 when accessed from private NIC, but Nginx shows the correct IP. Public IP gets forwarded fine
                            
                                Check if data available in sockets in python
                            
                                Docker container not connecting to https endpoints
                            
                                Does listen's backlog number include SYN-received connections count in case of TCP in Linux?
                            
                                What's the best way to read and parse a large text file over the network?
                            
                                Docker create two bridges that corrupts my internet access
                            
                                No Consistent Way to Connect ADB over TCP
                            
                                What's the idiomatic way to do async socket programming in Delphi?
                            
                                Faster detection of a broken socket in Java/Android

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With