Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop: binding multiple IP addresses to a cluster NameNode

I've a four-node Hadoop cluster on Softlayer. The master (NameNode) has a public IP address for external access and a private IP address for cluster access. The slave nodes (datanodes) have private IP address which I'm trying to connect to the master without the need of assigning public IP addresses to each slave node.

I've realised that setting fs.defaultFS to the NameNode's public address allows for external access, except that the NameNode only listens to that address for incoming connections, not the private address. So I get ConnectionRefused exceptions in the datanode logs as they're trying to connect with the NameNode's private IP address.

I figured the solution might be to set both the public and private IP address to the NameNode so that external access is preserved and allows my slaves nodes to connect as well.

So is there a way I can bind both these addresses to the NameNode so that it will listen on both?

Edit: Hadoop version 2.4.1.

like image 574
ikradex Avatar asked Aug 05 '14 09:08

ikradex


People also ask

How many Namenodes exist on a cluster?

you can have 1 Name Node for entire cluster. If u are serious about the performance, then you can configure another Name Node for other set of racks. But 1 Name Node per Rack is not advisable. In Hadoop 1.

How many DataNodes can run on a single Hadoop cluster?

With 100 DataNodes in a cluster, 64GB of RAM on the NameNode provides plenty of room to grow the cluster."

Can a NameNode be a DataNode?

Answer. Yes, you can have a DataNode on the same machine as the NameNode. However, it is recommended only when you have a small cluster (a few machines, for example, fewer than 10). When using the HDFS, the name node keeps track of all the data in the Hadoop file system.

How do you find IP address in Hadoop?

Easiest way would be to quickly open the core-site. xml file under HADOOP_HOME/conf directory. The value of fs.default.name property will tell you the host and port where NN is running. The fs.default.name gives the Localhost of the DataNode Tarek.


1 Answers

HDFS Support for Multihomed Networks and was done on the Cloudera HDFS Support for Multihomed Networks. Parameters for Multi-Homing for Hortonworks

<property>
  <name>dfs.namenode.rpc-bind-host</name>
  <value>0.0.0.0</value>
  <description>
    The actual address the RPC server will bind to. If this optional address is
    set, it overrides only the hostname portion of dfs.namenode.rpc-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node listen on all interfaces by
    setting it to 0.0.0.0.
  </description>
</property>

Additionally, is recommended to change dfs.namenode.rpc-bind-host, dfs.namenode.servicerpc-bind-host, dfs.namenode.http-bind-host and dfs.namenode.https-bind-host

By default HDFS endpoints are specified as either hostnames or IP addresses. In either case HDFS daemons will bind to a single IP address making the daemons unreachable from other networks.

The solution is to have separate setting for server endpoints to force binding the wildcard IP address INADDR_ANY i.e. 0.0.0.0. Do NOT supply a port number with any of these settings.

NOTE: Prefer using hostnames over IP addresses in master/slave configuration files.

<property>
  <name>dfs.namenode.rpc-bind-host</name>
  <value>0.0.0.0</value>
  <description>
    The actual address the RPC server will bind to. If this optional address is
    set, it overrides only the hostname portion of dfs.namenode.rpc-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node listen on all interfaces by
    setting it to 0.0.0.0.
  </description>
</property>

<property>
  <name>dfs.namenode.servicerpc-bind-host</name>
  <value>0.0.0.0</value>
  <description>
    The actual address the service RPC server will bind to. If this optional address is
    set, it overrides only the hostname portion of dfs.namenode.servicerpc-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node listen on all interfaces by
    setting it to 0.0.0.0.
  </description>
</property>

<property>
  <name>dfs.namenode.http-bind-host</name>
  <value>0.0.0.0</value>
  <description>
    The actual adress the HTTP server will bind to. If this optional address
    is set, it overrides only the hostname portion of dfs.namenode.http-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node HTTP server listen on all
    interfaces by setting it to 0.0.0.0.
  </description>
</property>

<property>
  <name>dfs.namenode.https-bind-host</name>
  <value>0.0.0.0</value>
  <description>
    The actual adress the HTTPS server will bind to. If this optional address
    is set, it overrides only the hostname portion of dfs.namenode.https-address.
    It can also be specified per name node or name service for HA/Federation.
    This is useful for making the name node HTTPS server listen on all
    interfaces by setting it to 0.0.0.0.
  </description>
</property>

Note: Before starting the modification stop the agent and server as follow:

  1. service cloudera-scm-agent stop
  2. service cloudera-scm-server stop

If your cluster is configured with primary and secondary NameNodes than this modification need to take place in both nodes. Modification is done with server and agent stopped

After completion and saving of the hdfs-site.xml file start the server and agent on NameNodes and also agent on DataNodes (this won't hurt the cluster if is done too) using the following:

  1. service cloudera-scm-agent start
  2. service cloudera-scm-server start

Same solution can be implemented for IBM BigInsights:

    To configure HDFS to bind to all the interfaces , add the following configuration variable using Ambari under the section HDFS
-> Configs ->Advanced -> Custom hdfs-site


    dfs.namenode.rpc-bind-host = 0.0.0.0

    Restart HDFS to apply the configuration change . 

    Verify if port 8020 is bound and listening to requests from all the interfaces using the following command. 

    netstat -anp|grep 8020
    tcp 0 0 0.0.0.0:8020 0.0.0.0:* LISTEN 15826/java

IBM BigInsights: How to configure Hadoop client port 8020 to bind to all the network interfaces?

In Cloudera in the HDFS configuration is a property called

In the HDFS configuration in Cloudera there is a property called Bind NameNode to Wildcard Address and just need to check the box and it will bind the service on 0.0.0.0

then restart hdfs service

 On the Home > Status tab, click  to the right of the service
 name and select Restart. Click Start on the next screen to confirm.
 When you see a Finished status, the service has restarted.

Starting, Stopping, Refreshing, and Restarting a Cluster Starting, Stopping, and Restarting Services

like image 133
n1tk Avatar answered Sep 19 '22 20:09

n1tk