If the replication factor is changed in the cluster,say, from 5 to 3 and the cluster is restarted, what happens to the old file blocks? Will they be considered as over replicated and get deleted or replication factor is applicable to only new files? Which means old file blocks are replicated 5 times and the new file blocks (after restart) are replicated 3 times. What happens if the cluster is not restarted?
You can find setrep command in the Hadoop file system. This command is used to change the replication factor of a file to a specific count instead of the default replication factor for the remaining in the HDFS file system.
we can change the dfs. replication value to 4 in $HADOOP_HOME/conf/hadoop-site. xml file. Which will start replicating to the factor of 4 for any new content that comes in.
The replication factor is a property that can be set in the HDFS configuration file that will allow you to adjust the global replication factor for the entire cluster. For each block stored in HDFS, there will be n – 1 duplicated blocks distributed across the cluster.
If the replication factor is changed in the cluster,say, from 5 to 3 and the cluster is restarted, what happens to the old file blocks?
Nothing happens to existing/old file blocks.
Will they be considered as over replicated and get deleted or replication factor is applicable to only new files?
The new replication factor will only apply to new files, as replication factor is not a HDFS-wide setting but a per-file attribute.
Which means old file blocks are replicated 5 times and the new file blocks (after restart) are replicated 3 times.
Its the invert of this. Existing files with replication factor set to 3 will continue to carry 3 blocks. New files created with a higher default replication factor will carry 5 blocks.
What happens if the cluster is not restarted?
Nothing happens if you do restart or don't restart your cluster. Since the property is per-file and is guided by clients when creating a file, a cluster restart isn't required to change this config either. You only need to update your client configs.
If you look to change all your old files' replication factor, consider running the replication changer command: hadoop fs -setrep -R 5 /
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With