I'm running Hadoop 1.1.2 on a cluster with 10+ machines. I would like to nicely scale up and down, both for HDFS and MapReduce. By "nicely", I mean that I require that data not be lost (allow HDFS nodes to decomission), and nodes running a task finish before shutting down. I've noticed the datanode process dies once decomissioning is done, which is good. This is what I do to remove a node: <ul> <li>Add node to mapred.exclude</li> <li>Add node to hdfs.exclude</li> <li><code>$ hadoop mradmin -refreshNodes</code></li> <li><code>$ hadoop dfsadmin -refreshNodes</code></li> <li><code>$ hadoop-daemon.sh stop tasktracker</code></li> </ul> To add the node back in (assuming it was removed like above), this is what I'm doing. <ul> <li>Remove from mapred.exclude</li> <li>Remove from hdfs.exclude</li> <li><code>$ hadoop mradmin -refreshNodes</code></li> <li><code>$ hadoop dfsadmin -refreshNodes</code></li> <li><code>$ hadoop-daemon.sh start tasktracker</code></li> <li><code>$ hadoop-daemon.sh start datanode</code></li> </ul> Is this the correct way to scale up and down "nicely"? When scaling down, I'm noticing job-duration rises sharply for certain unlucky jobs (since the tasks they had running on the removed node need to be re-scheduled).

If you have not set dfs exclude file before, follow 1-3. Else start from 4. <ol> <li>Shut down the NameNode.</li> <li>Set dfs.hosts.exclude to point to an empty exclude file. </li> <li>Restart NameNode.</li> <li>In the dfs exclude file, specify the nodes using the full hostname or IP or IP:port format. </li> <li>Do the same in mapred.exclude</li> <li>execute <code>bin/hadoop dfsadmin -refreshNodes</code>. This forces the NameNode to reread the exclude file and start the decommissioning process.</li> <li>execute <code>bin/hadoop mradmin -refreshNodes</code> </li> <li>Monitor the NameNode and JobTracker web UI and confirm the decommission process is in progress. It can take a few seconds to update. Messages like <code>"Decommission complete for node XXXX.XXXX.X.XX:XXXXX"</code> will appear in the NameNode log files when it finishes decommissioning, at which point you can remove the nodes from the cluster. </li> <li>When the process has completed, the namenode UI will list the datanode as decommissioned. The Jobtracker page will show the updated number of active nodes. Run <code>bin/hadoop dfsadmin -report</code> to verify. Stop the datanode and tasktracker process on the excluded node(s). </li> <li>If you do not plan to reintroduce the machine to the cluster, remove it from the include and exclude files.</li> </ol> To add a node as datanode and tasktracker see Hadoop FAQ page EDIT : When a live node is to be removed from the cluster, what happens to the Job ? The jobs running on a node to be de-commissioned would get affected as the tasks of the job scheduled on that node(s) would be marked as KILLED_UNCLEAN (for map and reduce tasks) or KILLED (for job setup and cleanup tasks). See line 4633 in JobTracker.java for details. The job will be informed to fail that task. Most of the time, Job tracker will reschedule execution. However, after many repeated failures it may instead decide to allow the entire job to fail or succeed. See line 2957 onwards in JobInProgress.java.

How do I correctly remove nodes in Hadoop?

Tags:

hadoop

I'm running Hadoop 1.1.2 on a cluster with 10+ machines. I would like to nicely scale up and down, both for HDFS and MapReduce. By "nicely", I mean that I require that data not be lost (allow HDFS nodes to decomission), and nodes running a task finish before shutting down.

I've noticed the datanode process dies once decomissioning is done, which is good. This is what I do to remove a node:

Add node to mapred.exclude
Add node to hdfs.exclude
$ hadoop mradmin -refreshNodes
$ hadoop dfsadmin -refreshNodes
$ hadoop-daemon.sh stop tasktracker

To add the node back in (assuming it was removed like above), this is what I'm doing.

Remove from mapred.exclude
Remove from hdfs.exclude
$ hadoop mradmin -refreshNodes
$ hadoop dfsadmin -refreshNodes
$ hadoop-daemon.sh start tasktracker
$ hadoop-daemon.sh start datanode

Is this the correct way to scale up and down "nicely"? When scaling down, I'm noticing job-duration rises sharply for certain unlucky jobs (since the tasks they had running on the removed node need to be re-scheduled).

859

asked May 27 '13 13:05

Philippe Signoret

2 Answers

If you have not set dfs exclude file before, follow 1-3. Else start from 4.

Shut down the NameNode.
Set dfs.hosts.exclude to point to an empty exclude file.
Restart NameNode.
In the dfs exclude file, specify the nodes using the full hostname or IP or IP:port format.
Do the same in mapred.exclude
execute bin/hadoop dfsadmin -refreshNodes. This forces the NameNode to reread the exclude file and start the decommissioning process.
execute bin/hadoop mradmin -refreshNodes
Monitor the NameNode and JobTracker web UI and confirm the decommission process is in progress. It can take a few seconds to update. Messages like "Decommission complete for node XXXX.XXXX.X.XX:XXXXX" will appear in the NameNode log files when it finishes decommissioning, at which point you can remove the nodes from the cluster.
When the process has completed, the namenode UI will list the datanode as decommissioned. The Jobtracker page will show the updated number of active nodes. Run bin/hadoop dfsadmin -report to verify. Stop the datanode and tasktracker process on the excluded node(s).
If you do not plan to reintroduce the machine to the cluster, remove it from the include and exclude files.

To add a node as datanode and tasktracker see Hadoop FAQ page

EDIT : When a live node is to be removed from the cluster, what happens to the Job ?

The jobs running on a node to be de-commissioned would get affected as the tasks of the job scheduled on that node(s) would be marked as KILLED_UNCLEAN (for map and reduce tasks) or KILLED (for job setup and cleanup tasks). See line 4633 in JobTracker.java for details. The job will be informed to fail that task. Most of the time, Job tracker will reschedule execution. However, after many repeated failures it may instead decide to allow the entire job to fail or succeed. See line 2957 onwards in JobInProgress.java.

198

answered Oct 23 '22 10:10

Tejas Patil

You should be aware that since for Hadoop to perform well, it really wants to have the data available in multiple copies. By removing nodes, you remove the chances of the data being optimally available, and you put extra stress on the cluster to ensure the availablility.

I.e. by taking down a node, you do enfore that an extra copy of all its data is made somewhere else. So you shouldn't really be doing this just for fun, not unless you use a different data management paradigm than in the default configuration (= keep 3 copies in the cluster).

And for a Hadoop cluster to perform well, you will want to actually store the data in the cluster. Otherwise, you can't really move the computation to the data, because the data isn't there yet either. Much about Hadoop is about having "smart drives" that can perform computation before sending the data across the network.

So in order to make this reasonable, you will likely need to somehow split your cluster. Have one set of nodes keep the 3 master copies of the original data, and have some "add-on" nodes that are only used for storing intermediate data and perform computations on that part. Never change the master nodes, so they don't need to redistribute your data. Shut down add-on nodes only when they are empty? But that probably is not yet implemented.

answered Oct 23 '22 09:10

Has QUIT--Anony-Mousse

Related questions
                            
                                CloudStore vs. HDFS
                            
                                Hadoop Spill failure
                            
                                why we need hadoop for hypertable
                            
                                Why does my streaming command fail for MapReduce basic program?
                            
                                Importing data from HDFS to Hive table
                            
                                Interpreting output from mahout clusterdumper
                            
                                How to uninstall Hadoop?
                            
                                What would be a good application for an enhanced version of MapReduce that shares information between Mappers?
                            
                                Updating a hadoop HDFS file
                            
                                what's the best practice for pooling Hive JDBC connections
                            
                                How do I use hadoop fs -getmerge to download .deflate files?
                            
                                Giraph Shortest Paths Example ClassNotFoundException
                            
                                handoop connect error with put/copyFromLocal
                            
                                When it comes to mapreduce how are the Accumulo tablets mapped to an HDFS block
                            
                                Permission denied (publickey) on EC2 while starting Hadoop
                            
                                Hadoop: Split metadata size exceeded 10000000
                            
                                Saving ordered dataframe in Spark
                            
                                Neural Network training in parallel, better to use Hadoop or a gpu?
                            
                                Spark: long delay between jobs
                            
                                Where HDFS stores data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With