Can anyone give a detailed analysis of memory consumption of namenode? Or is there some reference material ? Can not find material in the network.Thank you!

There are several technical limits to the NameNode (NN), and facing any of them will limit your scalability. <ol> <li>Memory. NN consume about 150 bytes per each block. From here you can calculate how much RAM you need for your data. There is good discussion: Namenode file quantity limit.</li> <li>IO. NN is doing 1 IO for each change to filesystem (like create, delete block etc). So your local IO should allow enough. It is harder to estimate how much you need. Taking into account fact that we are limited in number of blocks by memory you will not claim this limit unless your cluster is very big. If it is - consider SSD.</li> <li>CPU. Namenode has considerable load keeping track of health of all blocks on all datanodes. Each datanode once a period of time report state of all its block. Again, unless cluster is not too big it should not be a problem. </li> </ol>

Example calculation 200 node cluster 24TB/node 128MB block size Replication factor = 3 How much space is required? # blocks = 200*24*2^20/(128*3) ~12Million blocks ~12,000 MB memory.

The memory consumption of hadoop's namenode?

3 Answers

I suppose the memory consumption would depend on your HDFS setup, so depending on overall size of the HDFS and is relative to block size. From the Hadoop NameNode wiki:

Use a good server with lots of RAM. The more RAM you have, the bigger the file system, or the smaller the block size.

From https://twiki.opensciencegrid.org/bin/view/Documentation/HadoopUnderstanding:

Namenode: The core metadata server of Hadoop. This is the most critical piece of the system, and there can only be one of these. This stores both the file system image and the file system journal. The namenode keeps all of the filesystem layout information (files, blocks, directories, permissions, etc) and the block locations. The filesystem layout is persisted on disk and the block locations are kept solely in memory. When a client opens a file, the namenode tells the client the locations of all the blocks in the file; the client then no longer needs to communicate with the namenode for data transfer.

the same site recommends the following:

Namenode: We recommend at least 8GB of RAM (minimum is 2GB RAM), preferably 16GB or more. A rough rule of thumb is 1GB per 100TB of raw disk space; the actual requirements is around 1GB per million objects (files, directories, and blocks). The CPU requirements are any modern multi-core server CPU. Typically, the namenode will only use 2-5% of your CPU. As this is a single point of failure, the most important requirement is reliable hardware rather than high performance hardware. We suggest a node with redundant power supplies and at least 2 hard drives.

For a more detailed analysis of memory usage, check this link out: https://issues.apache.org/jira/browse/HADOOP-1687

You also might find this question interesting: Hadoop namenode memory usage

175

answered Sep 28 '22 03:09

Pitt

There are several technical limits to the NameNode (NN), and facing any of them will limit your scalability.

Memory. NN consume about 150 bytes per each block. From here you can calculate how much RAM you need for your data. There is good discussion: Namenode file quantity limit.
IO. NN is doing 1 IO for each change to filesystem (like create, delete block etc). So your local IO should allow enough. It is harder to estimate how much you need. Taking into account fact that we are limited in number of blocks by memory you will not claim this limit unless your cluster is very big. If it is - consider SSD.
CPU. Namenode has considerable load keeping track of health of all blocks on all datanodes. Each datanode once a period of time report state of all its block. Again, unless cluster is not too big it should not be a problem.

answered Sep 28 '22 03:09

David Gruzman

Example calculation

200 node cluster
24TB/node
128MB block size
Replication factor = 3

How much space is required?

# blocks = 200*24*2^20/(128*3)
~12Million blocks
~12,000 MB memory.

answered Sep 28 '22 02:09

user166555

Related questions
                            
                                how to order my tuple of spark results descending order using value
                            
                                Setting YARN queue in PySpark
                            
                                CAP with distributed System
                            
                                How to copy first few lines of a large file in hadoop to a new file?
                            
                                Could you give me any clue Why 'Cannot call methods on a stopped SparkContext'?
                            
                                How to find Hadoop hdfs directory on my system?
                            
                                Running jobs parallely in hadoop
                            
                                How to import org.apache Java dependencies w/ or w/o Maven
                            
                                dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured
                            
                                Is it possible to import data into Hive table without copying the data
                            
                                How can I pre split in hbase
                            
                                Avoid creation of _$folder$ keys in S3 with hadoop (EMR)
                            
                                Run hadoop in the Mac OS
                            
                                How to Practice Hadoop Programming? [closed]
                            
                                Error in pig while loading data
                            
                                What does the following fields: 'totalSize' and 'rawDataSize' mean in DESCRIBE EXTENDED query output in hive?
                            
                                How to calculate seconds between two timestamps in Impala?
                            
                                Apache Spark Running Locally Giving Refused Connection Error
                            
                                Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
                            
                                unable to check nodes on hadoop [Connection refused]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

The memory consumption of hadoop's namenode?

Tags:

hadoop

memory-consumption

jun zhou

People also ask

3 Answers

Pitt

David Gruzman

user166555

Recent Activity

Donate For Us