Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Name Node stores what?

  1. In case of "Name Node", what gets stored in main memory and what gets stored in secondary memory ( hard disk ).

  2. What we mean by "file to block mapping" ?

  3. What exactly is fsimage and edit logs ?

like image 505
Chirag Avatar asked Dec 20 '13 01:12

Chirag


1 Answers

In case of "Name Node", what gets stored in main memory and what gets stored in secondary memory ( hard disk ).

The file to block mapping, locations of blocks on data nodes, active data nodes, a bunch of other metadata is all stored in memory on the NameNode. When you check the NameNode status website, pretty much all of that information is stored in memory somewhere.

The only thing stored on disk is the fsimage, edit log, and status logs. It's interesting to note that the NameNode never really uses these files on disk, except for when it starts. The fsimage and edits file pretty much only exist to be able to bring the NameNode back up if it needs to be stopped or it crashes.

What we mean by "file to block mapping" ?

When a file is put into HDFS, it is split into blocks (of configurable size). Let's say you have a file called "file.txt" that is 201MB and your block size is 64MB. You will end up with three 64MB blocks and a 9MB block (64+64+64+9 = 201). The NameNode keeps track of the fact that "file.txt" in HDFS maps to these four blocks. DataNodes store blocks, not files, so the mapping is important to understanding where your data is and what your data is.

What exactly is fsimage and edit logs ?

A recent checkpoint of the memory of the NameNode is stored in the fsimage. The NameNode's state (i.e. file->block mapping, file properties, etc.) from that checkpoint can be restored from this file.

The edits file are all the new updates from the fsimage since the last checkpoint. These are things like a file being deleted or added. This is important for if your NameNode goes down, as it has the most recent changes since the last checkpoint stored in fsimage. The way the NameNode comes up is it materializes the fsimage into memory, and then applies the edits in the order it sees them in the edits file.

fsimage and edits exist the way they do because editing the potentially massive fsimage file every time a HDFS operation is done can be hard on the system. Instead, the edits file is simply appended to. However, for the NameNode starting up and for data storage reasons, rolling the edits into the fsimage every now and then is a good thing.

The SecondaryNameNode is the process that periodically takes the fsimage and edits file and merges them together, into a new checkpointed fsimage file. This is an important process to prevent edits from getting huge.

like image 177
Donald Miner Avatar answered Sep 27 '22 16:09

Donald Miner