I have set up a multi node Hadoop Cluster. The NameNode and Secondary namenode runs on the same machine and the cluster has only one Datanode. All the nodes are configured on Amazon EC2 machines.
masters 54.68.218.192 (public IP of the master node) slaves 54.68.169.62 (public IP of the slave node)
core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration>
core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://54.68.218.192:10001</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>54.68.218.192:10002</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration>
the jps run on the Namenode give the following:
5696 NameNode 6504 Jps 5905 SecondaryNameNode 6040 ResourceManager
and jps on datanode:
2883 DataNode 3496 Jps 3381 NodeManager
which to me seems right.
Now when I try to run a put command:
hadoop fs -put count_inputfile /test/input/
It gives me the following error:
put: File /count_inputfile._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
The logs on the datanode says the following:
hadoop-datanode log INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 54.68.218.192/54.68.218.192:10001. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
yarn-nodemanager log:
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
The web UI of node manager(50070) shows that there are 0 live nodes and 0 dead nodes and the dfs used is 100%
I have also disabled IPV6.
On a few websites I found out that I should also edit the /etc/hosts
file. I have also edited them and they look like this:
127.0.0.1 localhost 172.31.25.151 ip-172-31-25-151.us-west-2.compute.internal 172.31.25.152 ip-172-31-25-152.us-west-2.compute.internal
Why I am still geting the error?
As soon as the datanodes are declared dead. Data blocks on the failed Datanode are replicated on other Datanodes based on the specified replication factor in hdfs-site. xml file. Once the failed datanodes comes back the Name node will manage the replication factor again.
Your answer When NameNode notices that it has not received a heartbeat message from a datanode after a certain amount of time (usually 10 minutes by default), the data node is marked as dead. Since blocks will be under-replicated, the system begins replicating the blocks that were stored on the dead DataNode.
Datanode daemon should be started manually using $HADOOP_HOME/bin/hadoop-daemon.sh script. Master (NameNode) should correspondingly join the cluster after automatically contacted. New node should be added to the configuration/slaves file in the master server. New node will be identified by script-based commands.
Two things worked for me,
STEP 1 : stop hadoop and clean temp files from hduser
sudo rm -R /tmp/*
also, you may need to delete and recreate /app/hadoop/tmp
(mostly when I change hadoop version from 2.2.0
to 2.7.0
)
sudo rm -r /app/hadoop/tmp sudo mkdir -p /app/hadoop/tmp sudo chown hduser:hadoop /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp
STEP 2: format namenode
hdfs namenode -format
Now, I can see DataNode
hduser@prayagupd:~$ jps 19135 NameNode 20497 Jps 19477 DataNode 20447 NodeManager 19902 SecondaryNameNode 20106 ResourceManager
I had the same problem after improper shutdown of the node. Also checked in the UI the datanode is not listed.
Now it's working after deleting the files from datanode folder and restarting services.
stop-all.sh
rm -rf /usr/local/hadoop_store/hdfs/datanode/*
start-all.sh
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With