Hadoop HDFS - Difference between Missing replica and Under replicated blocks

1 Answers

From "Hadoop: The Definitive Guide" by Tom White:

Over-replicated blocks These are blocks that exceed their target replication for the file they belong to. Normally, over-replication is not a problem, and HDFS will automatically delete excess replicas.

Under-replicated blocks These are blocks that do not meet their target replication for the file they belong to. HDFS will automatically create new replicas of under-replicated blocks until they meet the target replication. You can get information about the blocks being replicated (or waiting to be replicated) using hdfs dfsadmin -metasave .

Misreplicated blocks These are blocks that do not satisfy the block replica placement policy (see Replica Placement). For example, for a replication level of three in a multirack cluster, if all three replicas of a block are on the same rack, then the block is misreplicated because the replicas should be spread across at least two racks for resilience. HDFS will automatically re-replicate misreplicated blocks so that they satisfy the rack placement policy.

Corrupt blocks These are blocks whose replicas are all corrupt. Blocks with at least one noncorrupt replica are not reported as corrupt; the namenode will replicate the noncorrupt replica until the target replication is met.

Missing replicas These are blocks with no replicas anywhere in the cluster.

Hope this answers your question.

190

answered Sep 27 '22 23:09

MaxNevermind

Related questions
                            
                                Write Log4j output to HDFS
                            
                                Oozie stuck in PREP state
                            
                                Ambari Name Node Startup Fails when safe mode is on.
                            
                                start-dfs.sh: command not found
                            
                                Reading an ORC file in Java
                            
                                Hadoop namenode can't get out of safemode
                            
                                What is the difference between TRUNC and TO_DATE in Hive
                            
                                Specifying compression codec for a INSERT OVERWRITE SELECT in Hive
                            
                                How can I be sure that data is distributed evenly across the hadoop nodes?
                            
                                Log Structured Merge Tree in Hbase
                            
                                POC for Hadoop in real time scenario
                            
                                Specifying multiple filter criteria through Oozie command line
                            
                                Free Hadoop Cluster for Experiments [closed]
                            
                                Why we need to move external table to managed hive table?
                            
                                Differences between existing MapReduce and YARN (MRv2)
                            
                                spark on yarn; how to send metrics to graphite sink?
                            
                                Hadoop 2.x -- how to configure secondary namenode?
                            
                                query hive partitioned table over date/time range
                            
                                Kafka Memory requirement
                            
                                How to know the exact block size of a file on a Hadoop node?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hadoop HDFS - Difference between Missing replica and Under replicated blocks

Tags:

hadoop

hdfs

fsck

Dinesh Kumar P

People also ask

1 Answers

MaxNevermind

Recent Activity

Donate For Us