I installed the Cloudera CDH4 distribution on a single machine in pseudo-distributed mode and successfully tested that it was working correctly (e.g. can run MapReduce programs, insert data on the Hive server, etc.) However, if I chance the core-site.xml
file to have fs.default.name
set to machine name rather than localhost
and restart the NameNode service, the HDFS enters safe-mode.
Before the change of fs.default.name
, I ran the following to check the state of the HDFS:
$ hadoop dfsadmin -report
...
Configured Capacity: 18503614464 (17.23 GB)
Present Capacity: 13794557952 (12.85 GB)
DFS Remaining: 13790785536 (12.84 GB)
DFS Used: 3772416 (3.60 MB)
DFS Used%: 0.03%
Under replicated blocks: 2
Blocks with corrupt replicas: 0
Missing blocks: 0
Then I made the modification to core-site.xml
(with the machine name being hadoop
):
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop:8020</value>
</property>
I restarted the service and reran the report.
$ sudo service hadoop-hdfs-namenode restart
$ hadoop dfsadmin -report
...
Safe mode is ON
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
An interesting note is that I can still perform some HDFS commands. For example, I can run
$ hadoop fs -ls /tmp
However, if I try to read a file using hadoop fs -cat
or try to place a file in the HDFS, I am told the NameNode is in safemode.
$ hadoop fs -put somefile .
put: Cannot create file/user/hadinstall/somefile._COPYING_. Name node is in safe mode.
The reason I need the fs.default.name
to be set to the machine name is because I need to communicate with this machine on port 8020 (the default NameNode port). If fs.default.name
is left to localhost
, then the NameNode service will not listen to external connection requests.
I am at a loss as to why this is happening and would appreciate any help.
xml defining fs.default.name. However, core-site. xml is used by both the Hadoop client, to get the URI of the default filesystem, as well as by the namenode, to read its address. This was surprising because my understanding was that the Hadoop namenode reads all its configuration parameters from hdfs-site.
The fs. defaultFS makes HDFS a file abstraction over a cluster, so that its root is not the same as the local system's. You need to change the value in order to create the distributed file system.
These files are all found in the hadoop/conf directory. For setting HDFS you have to configure core-site. xml and hdfs-site. xml.
The issue stemmed from domain name resolution. The /etc/hosts
file needed to be modified to point the IP address of the machine of the hadoop
machine for both localhost
and the fully qualified domain name.
192.168.0.201 hadoop.fully.qualified.domain.com localhost
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With