Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

There are 0 datanode(s) running and no node(s) are excluded in this operation

I have set up a multi node Hadoop Cluster. The NameNode and Secondary namenode runs on the same machine and the cluster has only one Datanode. All the nodes are configured on Amazon EC2 machines.

Following are the configuration files on the master node:

masters 54.68.218.192 (public IP of the master node)  slaves 54.68.169.62 (public IP of the slave node) 

core-site.xml

<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration> 

mapred-site.xml

<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> 

hdfs-site.xml

<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration> 

Now are the configuration files on the datanode:

core-site.xml

<configuration> <property> <name>fs.default.name</name> <value>hdfs://54.68.218.192:10001</value> </property> </configuration> 

mapred-site.xml

<configuration> <property> <name>mapred.job.tracker</name> <value>54.68.218.192:10002</value> </property> </configuration> 

hdfs-site.xml

<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property> </configuration> 

the jps run on the Namenode give the following:

5696 NameNode 6504 Jps 5905 SecondaryNameNode 6040 ResourceManager 

and jps on datanode:

2883 DataNode 3496 Jps 3381 NodeManager 

which to me seems right.

Now when I try to run a put command:

hadoop fs -put count_inputfile /test/input/ 

It gives me the following error:

put: File /count_inputfile._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation. 

The logs on the datanode says the following:

hadoop-datanode log INFO org.apache.hadoop.ipc.Client: Retrying connect to server:      54.68.218.192/54.68.218.192:10001. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 

yarn-nodemanager log:

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 

The web UI of node manager(50070) shows that there are 0 live nodes and 0 dead nodes and the dfs used is 100%

I have also disabled IPV6.

On a few websites I found out that I should also edit the /etc/hosts file. I have also edited them and they look like this:

127.0.0.1 localhost 172.31.25.151 ip-172-31-25-151.us-west-2.compute.internal 172.31.25.152 ip-172-31-25-152.us-west-2.compute.internal 

Why I am still geting the error?

like image 741
Learner Avatar asked Oct 24 '14 09:10

Learner


People also ask

What happens when DataNode fails?

As soon as the datanodes are declared dead. Data blocks on the failed Datanode are replicated on other Datanodes based on the specified replication factor in hdfs-site. xml file. Once the failed datanodes comes back the Name node will manage the replication factor again.

What happens to the Blocks on a DataNode when the DataNode is marked as dead by NameNode?

Your answer When NameNode notices that it has not received a heartbeat message from a datanode after a certain amount of time (usually 10 minutes by default), the data node is marked as dead. Since blocks will be under-replicated, the system begins replicating the blocks that were stored on the dead DataNode.

How do I manually start my DataNode?

Datanode daemon should be started manually using $HADOOP_HOME/bin/hadoop-daemon.sh script. Master (NameNode) should correspondingly join the cluster after automatically contacted. New node should be added to the configuration/slaves file in the master server. New node will be identified by script-based commands.


Video Answer


2 Answers

Two things worked for me,

STEP 1 : stop hadoop and clean temp files from hduser

sudo rm -R /tmp/* 

also, you may need to delete and recreate /app/hadoop/tmp (mostly when I change hadoop version from 2.2.0 to 2.7.0)

sudo rm -r /app/hadoop/tmp sudo mkdir -p /app/hadoop/tmp sudo chown hduser:hadoop /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp 

STEP 2: format namenode

hdfs namenode -format 

Now, I can see DataNode

hduser@prayagupd:~$ jps 19135 NameNode 20497 Jps 19477 DataNode 20447 NodeManager 19902 SecondaryNameNode 20106 ResourceManager 
like image 132
prayagupa Avatar answered Sep 24 '22 11:09

prayagupa


I had the same problem after improper shutdown of the node. Also checked in the UI the datanode is not listed.

Now it's working after deleting the files from datanode folder and restarting services.

stop-all.sh

rm -rf /usr/local/hadoop_store/hdfs/datanode/*

start-all.sh

like image 25
Tamilkumaran S Avatar answered Sep 23 '22 11:09

Tamilkumaran S