Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No data nodes are started

Tags:

hadoop

hdfs

I am trying to setup Hadoop version 0.20.203.0 in a pseudo distributed configuration using the following guide:

http://www.javacodegeeks.com/2012/01/hadoop-modes-explained-standalone.html

After running the start-all.sh script I run "jps".

I get this output:

4825 NameNode 5391 TaskTracker 5242 JobTracker 5477 Jps 5140 SecondaryNameNode 

When I try to add information to the hdfs using:

bin/hadoop fs -put conf input 

I got an error:

hadoop@m1a2:~/software/hadoop$ bin/hadoop fs -put conf input 12/04/10 18:15:31 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)         at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)         at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:616)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:416)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)          at org.apache.hadoop.ipc.Client.call(Client.java:1030)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)         at $Proxy1.addBlock(Unknown Source)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:616)         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)         at $Proxy1.addBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)  12/04/10 18:15:31 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 12/04/10 18:15:31 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hadoop/input/core-site.xml" - Aborting... put: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1 12/04/10 18:15:31 ERROR hdfs.DFSClient: Exception closing file /user/hadoop/input/core-site.xml : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)         at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)         at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:616)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:416)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)  org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)         at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)         at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:616)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:416)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)          at org.apache.hadoop.ipc.Client.call(Client.java:1030)         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)         at $Proxy1.addBlock(Unknown Source)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:616)         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)         at $Proxy1.addBlock(Unknown Source)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)         at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) 

I am not totally sure but I believe that this may have to do with the fact that the datanode is not running.

Does anybody know what I have done wrong, or how to fix this problem?

EDIT: This is the datanode.log file:

2012-04-11 12:27:28,977 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode STARTUP_MSG:   host = m1a2/139.147.5.55 STARTUP_MSG:   args = [] STARTUP_MSG:   version = 0.20.203.0 STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011 ************************************************************/ 2012-04-11 12:27:29,166 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2012-04-11 12:27:29,181 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2012-04-11 12:27:29,342 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2012-04-11 12:27:29,347 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists! 2012-04-11 12:27:29,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-hadoop/dfs/data: namenode namespaceID = 301052954; datanode namespaceID = 229562149         at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)         at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)         at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)         at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:268)         at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)         at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)         at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)         at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)         at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)  2012-04-11 12:27:29,617 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at m1a2/139.147.5.55 ************************************************************/ 
like image 866
Aaron S Avatar asked Apr 10 '12 22:04

Aaron S


People also ask

What happens when NameNode has no data?

If Namenode gets down then the whole Hadoop cluster is inaccessible and considered dead. Datanode stores actual data and works as instructed by Namenode. A Hadoop file system can have multiple data nodes but only one active Namenode.

What happens when a data node fails in HDFS?

Data blocks on the failed Datanode are replicated on other Datanodes based on the specified replication factor in hdfs-site. xml file. Once the failed datanodes comes back the Name node will manage the replication factor again.

What are data nodes?

A node is a basic unit of a data structure, such as a linked list or tree data structure. Nodes contain data and also may link to other nodes. Links between nodes are often implemented by pointers. It is a computer connected to the internet that participates in the peer to peer network.

How do I manually start my DataNode?

Datanode daemon should be started manually using $HADOOP_HOME/bin/hadoop-daemon.sh script. Master (NameNode) should correspondingly join the cluster after automatically contacted. New node should be added to the configuration/slaves file in the master server. New node will be identified by script-based commands.


2 Answers

That error you are getting in the DN log is described here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids

From that page:

At the moment, there seem to be two workarounds as described below.

Workaround 1: Start from scratch

I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude workaround I have found is to:

  1. Stop the cluster
  2. Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
  3. Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
  4. Restart the cluster

When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.

Workaround 2: Updating namespaceID of problematic DataNodes

Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is “minimally invasive” as you only have to edit one file on the problematic DataNodes:

  1. Stop the DataNode
  2. Edit the value of namespaceID in /current/VERSION to match the value of the current NameNode
  3. Restart the DataNode

If you followed the instructions in my tutorials, the full path of the relevant files are:

NameNode: /app/hadoop/tmp/dfs/name/current/VERSION

DataNode: /app/hadoop/tmp/dfs/data/current/VERSION

(background: dfs.data.dir is by default set to

${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir

in this tutorial to /app/hadoop/tmp).

If you wonder how the contents of VERSION look like, here’s one of mine:

# contents of /current/VERSION

namespaceID=393514426

storageID=DS-1706792599-10.10.10.1-50010-1204306713481

cTime=1215607609074

storageType=DATA_NODE

layoutVersion=-13

like image 146
Chris Shain Avatar answered Oct 22 '22 00:10

Chris Shain


Okay, I post this once more:

In case someone needs this, for newer version of Hadoop (basically I am running 2.4.0)

  • In this case stop the cluster sbin/stop-all.sh

  • Then go to /etc/hadoop for config files.

In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir dfs.namenode.data.dir

  • Delete both the directories recursively (rm -r).

  • Now format the namenode via bin/hadoop namenode -format

  • And finally sbin/start-all.sh

Hope this helps.

like image 41
apurva.nandan Avatar answered Oct 21 '22 23:10

apurva.nandan