In context of HDFS, we have Namenode and Datanode, what does it mean to say that Namenode stored the file system namespace?
Also, is the directory we specify for datanode (in hdfs-core.xml) the only place where we can store the data, or we can specify any other directory to hold the data?
Large deployments or deployments using lot of small files benefit from namespace scaling by allowing more NameNodes to be added to the cluster. Performance: File system throughput is not limited by a single NameNode. Adding more NameNodes to the cluster scales the file system read/write throughput.
How Does HDFS Store Data? HDFS divides files into blocks and stores each block on a DataNode. Multiple DataNodes are linked to the master node in the cluster, the NameNode. The master node distributes replicas of these data blocks across the cluster.
It may be implemented as a distributed filesystem, or as a "local" one that reflects the locally-connected disk. The local version exists for small Hadoop instances and for testing.
It means that the NameNode inserts the file name into the file system tree and allocates a data block for it. This actually happens when you are trying to put the data into HDFS.
Yes it is possible to have any number of data directories. Here is what you have to set in hdfs-site.xml in the conf folder.
<property>
<name>dfs.data.dir</name>
<value>path to data dir 1,path to data dir 2 etc</value>
</property>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With