Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to add nodes to a running Hadoop cluster?

I have been playing with Cloudera and I define the number of clusters before I start my job then use the cloudera manager to make sure everything is running.

I’m working on a new project that instead of using hadoop is using message queues to distribute the work but the results of the work are stored in HBase. I might launch 10 servers to process the job and store to Hbase but I’m wondering if I later decided to add a few more worker nodes can I easily (read: programmable) make them automatically connect to the running cluster so they can locally add to clusters HBase/HDFS?

Is this possible and what would I need to learn in order to do it?

like image 727
user1735075 Avatar asked Oct 31 '12 13:10

user1735075


2 Answers

Here is the documentation for adding a node to Hadoop and for HBase. Looking at the documentation, there is no need to restart the cluster. A node can be added dynamically.

like image 120
Praveen Sripati Avatar answered Jan 03 '23 01:01

Praveen Sripati


Following steps should help you launch the new node into the running cluster.

1> Update the /etc/hadoop/conf/slaves list with the new node-name
2> Sync the full configuration /etc/hadoop/conf to the new datanode from the Namenode. If the file system isn't shared.  
2>  Restart all the hadoop services on Namenode/Tasktracker and all the services on the new Datanode. 
3>  Verify the new datanode from the browser http://namenode:50070
4>  Run the balancer script to readjust the data between the nodes. 

If you don't want to restart the services on the NN, when you add a new node. I would say add the names ahead to the slaves configuration file. So they report as decommission/dead nodes until they are available. Following the above DataNode only steps. Again this not the best practice.

like image 35
Chakri Avatar answered Jan 03 '23 00:01

Chakri