In case of "Name Node", what gets stored in main memory and what gets stored in secondary memory ( hard disk ).
What we mean by "file to block mapping" ?
What exactly is fsimage and edit logs ?
In case of "Name Node", what gets stored in main memory and what gets stored in secondary memory ( hard disk ).
The file to block mapping, locations of blocks on data nodes, active data nodes, a bunch of other metadata is all stored in memory on the NameNode. When you check the NameNode status website, pretty much all of that information is stored in memory somewhere.
The only thing stored on disk is the fsimage, edit log, and status logs. It's interesting to note that the NameNode never really uses these files on disk, except for when it starts. The fsimage
and edits
file pretty much only exist to be able to bring the NameNode back up if it needs to be stopped or it crashes.
What we mean by "file to block mapping" ?
When a file is put into HDFS, it is split into blocks (of configurable size). Let's say you have a file called "file.txt" that is 201MB and your block size is 64MB. You will end up with three 64MB blocks and a 9MB block (64+64+64+9 = 201). The NameNode keeps track of the fact that "file.txt" in HDFS maps to these four blocks. DataNodes store blocks, not files, so the mapping is important to understanding where your data is and what your data is.
What exactly is fsimage and edit logs ?
A recent checkpoint of the memory of the NameNode is stored in the fsimage
. The NameNode's state (i.e. file->block mapping, file properties, etc.) from that checkpoint can be restored from this file.
The edits
file are all the new updates from the fsimage
since the last checkpoint. These are things like a file being deleted or added. This is important for if your NameNode goes down, as it has the most recent changes since the last checkpoint stored in fsimage
. The way the NameNode comes up is it materializes the fsimage
into memory, and then applies the edits
in the order it sees them in the edits
file.
fsimage
and edits
exist the way they do because editing the potentially massive fsimage
file every time a HDFS operation is done can be hard on the system. Instead, the edits
file is simply appended to. However, for the NameNode starting up and for data storage reasons, rolling the edits into the fsimage every now and then is a good thing.
The SecondaryNameNode is the process that periodically takes the fsimage
and edits
file and merges them together, into a new checkpointed fsimage
file. This is an important process to prevent edits
from getting huge.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With