Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we need Hadoop passwordless ssh?

  • AFAIK, passwordless ssh is needed so that the master node can start the daemon processes on each slave node. Apart from that, is there any use of having passwordless ssh for Hadoop's operation?

  • How are the user code jars and data chunks transferred across the slave nodes? I want to know the mechanism and the protocol used.

  • The passwordless SSH should ONLY be configured for master-slave pairs or even for amongst the slaves?

like image 272
Tejas Patil Avatar asked Dec 17 '12 06:12

Tejas Patil


1 Answers

You are correct. If ssh is not passwordless, you have to go on each individual machine and start all the processes there, manually. For your second question, all the communication in HDFS happens over TCP/IP and for the data movement HTTP is used. Mechanism goes like this :

A client establishes a connection to a configurable TCP port on the NameNode machine. It talks the ClientProtocol with the NameNode. The DataNodes talk to the NameNode using the DataNode Protocol. A Remote Procedure Call (RPC) abstraction wraps both the Client Protocol and the DataNode Protocol.

And for the third question, it's not necessary to have a passwordless ssh among the slave nodes.

like image 104
Tariq Avatar answered Sep 22 '22 11:09

Tariq