Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the Hadoop incompatible namespaceIDs issue happen?

Tags:

hadoop

This is a fairly well-documented error and the fix is easy, but does anyone know why Hadoop datanode NamespaceIDs can get screwed up so easily or how Hadoop assigns the NamespaceIDs when it starts up the datanodes?

Here's the error:

2010-08-06 12:12:06,900 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /Users/jchen/Data/Hadoop/dfs/data: namenode namespaceID = 773619367; datanode namespaceID = 2049079249
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:233)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:148)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:298)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:216)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1283)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1238)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1246)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)

This seems to even happen for single node instances.

like image 952
Jieren Avatar asked Aug 06 '10 16:08

Jieren


People also ask

What happens if one of the Datanodes gets failed in HDFS?

Data blocks on the failed Datanode are replicated on other Datanodes based on the specified replication factor in hdfs-site. xml file. Once the failed datanodes comes back the Name node will manage the replication factor again. This is how Namenode handles the failure of data node.

Which one of the failure causes Hdfs failure?

Mainly three types of failures are NameNode failures, DataNode failures and network partitions.

What happens on Hadoop When a node is down?

If Namenode gets down then the whole Hadoop cluster is inaccessible and considered dead. Datanode stores actual data and works as instructed by Namenode. A Hadoop file system can have multiple data nodes but only one active Namenode.

What is namespace ID in Hadoop?

Basically when we say Namespace we mean a certain location on the hdfs. In Hadoop we refer to a Namespace as a file or directory which is handled by the Name Node. According to Hadoop, Name Node manages the file system namespace.


1 Answers

Namenode generates new namespaceID every time you format HDFS. I think this is possibly to differentiate current version and previous version. You can always rollback to previous version if something is not proper which may not be possible if namespaceID is not unique for every formatted instance.

NamespaceID also connects namenode and datanodes. Datanodes bind themselves to namenode through namespaceID

like image 123
Harsha Hulageri Avatar answered Oct 26 '22 12:10

Harsha Hulageri