Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

steps to replace a hadoop namenodes and journal nodes

Tags:

hadoop

Setup: we have 3 machines: m1, m2 and m3 Below are the roles on each of these machines:

m1: namenode (active), zookeeper, hbase master, journalnode
m2: namenode (standby), zookeeper, hbase master, journalnode
m3: zookeeper, hbase master, journalnode

We are using namenode HA setup with QJM

All the the three machines need to be replaced with new machines (with SSD): new_m1, new_m2 and new_m3

new_m1: namenode (active), zookeeper, hbase master, journalnode
new_m2: namenode (standby), zookeeper, hbase master, journalnode
new_m3: zookeeper, hbase master, journalnode

The replacement will incur cluster downtime, but once the new master nodes are brought, the cluster should be able to resume its normal operations.

I need help to understand in detail, the steps needed to replace journal nodes and active + standby namenodes with new hardware, with out any data loss.

Greatly appreciate the most detailed step by step answer, thanks aton

There is no hadoop version upgrade, but this is just a in-place replacement of the hardware.

like image 510
cog_n1t1v3 Avatar asked Apr 20 '15 01:04

cog_n1t1v3


People also ask

How do I add a journal node in ambari?

In Ambari Web, select Services > HDFS > Summary. Click Service Actions, then click Manage JournalNodes. On the Assign JournalNodes page, make assignments by clicking the + and - icons and selecting host names in the drop-down menus.

How many failures can a system with 9 Journal nodes handle?

Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.

What is the role of Journal nodes in Hadoop?

The role of the Journal node is to keep the Namenodes in sync and prevent split brain. Journal nodes are distributed systems for storing edits.

What are NameNodes and DataNodes?

NameNode and DataNodesHDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories.


1 Answers

CASE I:

If you have installed your hadoop, hbase and zookeeper (with temp, dfs and namenode directories) under one common folder, it will be easy to backup data. Let us call this folder as home folder from now on. Just do this:

1. Create home folder in new active namenode system:

sudo mkdir -p /path/to/home/folder sudo chown -R hadoopuser:hadoopgroup /path/to/home/folder

2. Copy all contents of home folder(permissions preserved):

sudo scp -rp /path/to/home/folder/in/old/active/namenode hadoopuser@new-active-node-ip:/path/to/home/folder

3. Repeat these two steps for standby namenode and slave nodes.

NOTE: Create a backup of /etc/hosts file of each node before editing.

4. In order to reduce workload, rename your new nodes with same names as old ones in /etc/hosts file. (Give your old nodes some other names if necessary)

5. Start the new namenode to check if it works.

CASE II:

If your hadoop temp, dfs, namenode and journal directories does not belong to your home folder (i.e., you have configured these directories different from home folder), do the following:

1. Identify directory locations:

Find the locations of hadoop temp, dfs, namenode, journal directories in core-site.xml and hdfs-site.xml.

2. Copy contents:

Do ** step 1** and ** step 2** from CASE I for each directory to preserve the permissions.

3. Start the new namenode to check if it works.

like image 124
Rajesh N Avatar answered Nov 15 '22 11:11

Rajesh N