could anyone let me know how to fix missing replicas?
============================================================================
Total size: 3447348383 B
Total dirs: 120
Total files: 98
Total blocks (validated): 133 (avg. block size 25919912 B)
Minimally replicated blocks: 133 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 21 (15.789474 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.3834586
Corrupt blocks: 0
Missing replicas: 147 (46.37224 %)
Number of data-nodes: 3
Number of racks: 1
============================================================================
As per Indefinite guide,
Corrupt or missing blocks are the biggest cause for concern, as it means data has been lost. By default, fsck leaves files with corrupt or missing blocks, but you can tell it to perform one of the following actions on them:
• Move the affected files to the /lost+found directory in HDFS, using the -move option. Files are broken into chains of contiguous blocks to aid any salvaging efforts you may attempt.
• Delete the affected files, using the -delete option. Files cannot be recovered after being deleted.
Here my question is how to find out affected files? I have already worked with Hive to get the required outputs without any issue. will it affect performance/speed of query processing.
Regards,
Raj
Missing replicas should be self-healing over time. However, if you're wanting to move them to lost+found, you can use:
hadoop fsck / -move
Or delete them with:
hadoop fsck / -delete
If you just want to identify the files with under-replicated blocks, use:
hadoop fsck / -files -blocks -locations
That will give you lots of detail, including the list of expected/actual block replication counts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With