As the title indicates, when a client requests to write a file to the hdfs, how does the HDFS or name node choose which datanode to store the file? Does the hdfs try to store all the blocks of this file in the same node or some node in the same rack if it is too big? Does the hdfs provide any APIs for applications to store the file in a certain datanode as he likes?
how does the HDFS or name node choose which datanode to store the file?
HDFS has a BlockPlacementPolicyDefault, check the API documentation for more details. It should be possible to extend BlockPlacementPolicy for a custom behavior.
Does the hdfs provide any APIs for applications to store the file in a certain datanode as he likes?
The placement behavior should not be specific to a particular datanode. That's what makes HDFS resilient to failure and also scalable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With