No data nodes are started

Tags:

2 Answers

That error you are getting in the DN log is described here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids

From that page:

At the moment, there seem to be two workarounds as described below.

Workaround 1: Start from scratch

I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude workaround I have found is to:

Stop the cluster
Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
Restart the cluster

When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.

Workaround 2: Updating namespaceID of problematic DataNodes

Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is “minimally invasive” as you only have to edit one file on the problematic DataNodes:

Stop the DataNode
Edit the value of namespaceID in /current/VERSION to match the value of the current NameNode
Restart the DataNode

If you followed the instructions in my tutorials, the full path of the relevant files are:

NameNode: /app/hadoop/tmp/dfs/name/current/VERSION

DataNode: /app/hadoop/tmp/dfs/data/current/VERSION

(background: dfs.data.dir is by default set to

${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir

in this tutorial to /app/hadoop/tmp).

If you wonder how the contents of VERSION look like, here’s one of mine:

# contents of /current/VERSION

namespaceID=393514426

storageID=DS-1706792599-10.10.10.1-50010-1204306713481

cTime=1215607609074

storageType=DATA_NODE

layoutVersion=-13

146

answered Oct 22 '22 00:10

Chris Shain

Okay, I post this once more:

In case someone needs this, for newer version of Hadoop (basically I am running 2.4.0)

In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.

In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir dfs.namenode.data.dir

Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh

Hope this helps.

answered Oct 21 '22 23:10

apurva.nandan

Related questions
                            
                                Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database
                            
                                Apache Hadoop Yarn - Underutilization of cores
                            
                                What is the purpose of "uber mode" in hadoop?
                            
                                Find port number where HDFS is listening
                            
                                Is there an equivalent to `pwd` in hdfs?
                            
                                how to replace characters in hive?
                            
                                Pyspark: get list of files/directories on HDFS path
                            
                                No such method exception Hadoop <init>
                            
                                Accessing stream output from hdfs of MRjob
                            
                                Add a column in a table in HIVE QL
                            
                                Difference between `hadoop dfs` and `hadoop fs` [closed]
                            
                                How to convert .txt file to Hadoop's sequence file format
                            
                                Hadoop speculative task execution
                            
                                Select top 2 rows in Hive
                            
                                apache spark - check if file exists
                            
                                Why do I need to source bash_profile every time
                            
                                Would Spark unpersist the RDD itself when it realizes it won't be used anymore?
                            
                                Alter hive table add or drop column
                            
                                Merging multiple files into one within Hadoop
                            
                                Hive query to quickly find table size (number of rows)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

No data nodes are started

Tags:

hadoop

hdfs

Aaron S

People also ask

2 Answers

Chris Shain

apurva.nandan

Recent Activity

Donate For Us