Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop HDFS - Difference between Missing replica and Under replicated blocks

Tags:

hadoop

hdfs

fsck

I know that both Under-replicated blocks and Mis-replicated blocks occur due to lesser data node count with respect to replication factor set.

But what is the difference between them?

On re-setting replication factor to 1 where available data node is 1, both Under-replicated blocks and Missing replica error got cleared. Ensured this by executing command hdfs fsck / FSCK report

like image 519
Dinesh Kumar P Avatar asked Oct 13 '16 09:10

Dinesh Kumar P


People also ask

What are under replicated blocks in HDFS?

Some files in your HDFS file system are corrupted either by losing its last block replica or just being under-replicated. When a new DataNode is added, HDFS replicates these blocks. Even if the replication factor is set to 1, the HDFS still reports these blocks as under-replicated, as it is not fault tolerant.

What is under replicated block?

Under-replicated blocks These are blocks that do not meet their target replication for the file they belong to. HDFS will automatically create new replicas of under-replicated blocks until they meet the target replication.

Why does Hadoop have 3 replicas?

The default replication factor is 3 which can be configured as per the requirement; it can be changed to 2(less than 3) or can be increased (more than 3.). Because of the following reason, ideal replication factor is 3: If one copy is not accessible and corrupted then the data can be read from other copy.

Does HDFS replicate file blocks?

Data Replication. HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance.


1 Answers

From "Hadoop: The Definitive Guide" by Tom White:

Over-replicated blocks These are blocks that exceed their target replication for the file they belong to. Normally, over-replication is not a problem, and HDFS will automatically delete excess replicas.

Under-replicated blocks These are blocks that do not meet their target replication for the file they belong to. HDFS will automatically create new replicas of under-replicated blocks until they meet the target replication. You can get information about the blocks being replicated (or waiting to be replicated) using hdfs dfsadmin -metasave .

Misreplicated blocks These are blocks that do not satisfy the block replica placement policy (see Replica Placement). For example, for a replication level of three in a multirack cluster, if all three replicas of a block are on the same rack, then the block is misreplicated because the replicas should be spread across at least two racks for resilience. HDFS will automatically re-replicate misreplicated blocks so that they satisfy the rack placement policy.

Corrupt blocks These are blocks whose replicas are all corrupt. Blocks with at least one noncorrupt replica are not reported as corrupt; the namenode will replicate the noncorrupt replica until the target replication is met.

Missing replicas These are blocks with no replicas anywhere in the cluster.

Hope this answers your question.

like image 190
MaxNevermind Avatar answered Sep 27 '22 23:09

MaxNevermind