Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop: How do datanodes register with the namenode?

Tags:

hadoop

Do hadoop datanodes register themselves with the namenode by calling the namenode, or does the namenode have a list of datanodes and it reaches out to them.

I want to understand to better troubleshoot a problem with a new namenode I brought up (after a namenode failure) where it doesn't see any of the datanodes (but has the fsimage correct).

like image 566
David Parks Avatar asked May 21 '13 08:05

David Parks


People also ask

How does NameNode connect to DataNode?

A client establishes a connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to the NameNode using the DataNode Protocol. A Remote Procedure Call (RPC) abstraction wraps both the Client Protocol and the DataNode Protocol.

How a NameNode and DataNode communicate with each other?

All communication between Namenode and Datanode is initiated by the Datanode, and responded to by the Namenode. The Namenode never initiates communication to the Datanode, although Namenode responses may include commands to the Datanode that cause it to send further communications.

How does the NameNode choose which DataNodes to store replicas on?

You can see that when namenode instructs datanode to store data. The first replica is stored in the local machine and other two replicas are made on other rack and so on. If any replica fails, data is stored from other replica.

Which protocol is used for communication between NameNode and DataNodes?

The Communication Protocols The DataNodes talk to the NameNode using the DataNode Protocol. A Remote Procedure Call ( RPC ) abstraction wraps both the Client Protocol and the DataNode Protocol. By design, the NameNode never initiates any RPCs. Instead, it only responds to RPC requests issued by DataNodes or clients.


2 Answers

Data nodes heartbeat in to the name node. The name node does not reach out to data nodes.

Even when retrieving data, the name node does not reach out to the data nodes. The name node will inform the client where the data is and the client will retrieve it from the data nodes. (To clarify, during an MR workflow the Job Tracker finds from the name node where the data is and assigns task trackers appropriately.)

like image 167
Ilion Avatar answered Sep 22 '22 15:09

Ilion


Each datanode keeps the namenode details in hdfs.conf file. And namenode keep names of all data nodes in slaves file. I think you should update your slaves files in namenode and master file in datanodes.

like image 33
abhinav Avatar answered Sep 22 '22 15:09

abhinav