Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

name node Vs secondary name node

Tags:

Hadoop is Consistent and partition tolerant, i.e. It falls under the CP category of the CAP theoram.

Hadoop is not available because all the nodes are dependent on the name node. If the name node falls the cluster goes down.

But considering the fact that the HDFS cluster has a secondary name node why cant we call hadoop as available. If the name node is down the secondary name node can be used for the writes.

What is the major difference between name node and secondary name node that makes hadoop unavailable.

Thanks in advance.

like image 829
Sam Avatar asked Nov 14 '13 05:11

Sam


People also ask

How secondary name node is different from name node in HDFS?

Secondary namenode is just a helper for Namenode. It gets the edit logs from the namenode in regular intervals and applies to fsimage. Once it has new fsimage, it copies back to namenode. Namenode will use this fsimage for the next restart, which will reduce the startup time.

What is the purpose of the secondary name node?

The secondary NameNode merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary NameNode since its memory requirements are on the same order as the primary NameNode.

What is the difference between secondary name node backup node and Checkpoint name node?

Checkpoint- It fetches the fsimage and edits log file from the namenode and merge them periodically. And the upload the new fsimage to active NameNode. Secondary NameNode- It also fetches the fsimage and edits log file from the namenode and merge them periodically. But upload functionality is not present in it.

Is secondary NameNode backup of name node?

No, Secondary NameNode is not a backup of NameNode. You can call it a helper of NameNode. NameNode is the master daemon which maintains and manages the DataNodes. It regularly receives a Heartbeat and a block report from all the DataNodes in the cluster to ensure that the DataNodes are live.


1 Answers

The namenode stores the HDFS filesystem information in a file named fsimage. Updates to the file system (add/remove blocks) are not updating the fsimage file, but instead are logged into a file, so the I/O is fast append only streaming as opposed to random file writes. When restaring, the namenode reads the fsimage and then applies all the changes from the log file to bring the filesystem state up to date in memory. This process takes time.

The secondarynamenode job is not to be a secondary to the name node, but only to periodically read the filesystem changes log and apply them into the fsimage file, thus bringing it up to date. This allows the namenode to start up faster next time.

Unfortunatley the secondarynamenode service is not a standby secondary namenode, despite its name. Specifically, it does not offer HA for the namenode. This is well illustrated here.

See Understanding NameNode Startup Operations in HDFS.

Note that more recent distributions (current Hadoop 2.6) introduces namenode High Availability using NFS (shared storage) and/or namenode High Availability using Quorum Journal Manager.

like image 111
Remus Rusanu Avatar answered Oct 07 '22 22:10

Remus Rusanu