I am trying to learn Hadoop by following a tutorial and trying to do pseudo-distributed mode on my machine.
My core-site.xml
is:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. </description> </property> </configuration>
My hdfs-site.xml
file is:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> <description>The actual number of replications can be specified when the file is created. </description> </property> </configuration>
My mapred-site.xml
file is:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> <description>The host and port that the MapReduce job tracker runs at. </description> </property> </configuration>
When I run the command it ran successfully but what it is doing actually:
hadoop-1.2.1$ bin/hadoop namenode -format 14/11/26 12:37:16 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = myhost/127.0.0.8 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.6.0_45 ************************************************************/ 14/11/26 12:37:17 INFO util.GSet: Computing capacity for map BlocksMap 14/11/26 12:37:17 INFO util.GSet: VM type = 64-bit 14/11/26 12:37:17 INFO util.GSet: 2.0% max memory = 932118528 14/11/26 12:37:17 INFO util.GSet: capacity = 2^21 = 2097152 entries 14/11/26 12:37:17 INFO util.GSet: recommended=2097152, actual=2097152 14/11/26 12:37:17 INFO namenode.FSNamesystem: fsOwner=myuser 14/11/26 12:37:17 INFO namenode.FSNamesystem: supergroup=supergroup 14/11/26 12:37:17 INFO namenode.FSNamesystem: isPermissionEnabled=true 14/11/26 12:37:17 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 14/11/26 12:37:17 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 14/11/26 12:37:17 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 14/11/26 12:37:17 INFO namenode.NameNode: Caching file names occuring more than 10 times 14/11/26 12:37:17 INFO common.Storage: Image file /tmp/hadoop-myuser/dfs/name/current/fsimage of size 115 bytes saved in 0 seconds. 14/11/26 12:37:18 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits 14/11/26 12:37:18 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/tmp/hadoop-myuser/dfs/name/current/edits 14/11/26 12:37:18 INFO common.Storage: Storage directory /tmp/hadoop-myuser/dfs/name has been successfully formatted. 14/11/26 12:37:18 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at chaitanya-OptiPlex-3010/127.0.0.8 ************************************************************/
Can someone please let me know what it is doing internally.
I have gone through these posts but there is no correct explanation.
What exactly is hadoop namenode formatting?
hadoop namenode is not formatting
How can I check this practically on my machine so I can see the differences before and after running the command. I am new to Hadoop so this can be a trivial question.
Answer (1) Hadoop Namenode is used to specify the default file system and also the defaults of your local file system.So, you need to set it to a HDFS address. This is essential for configuration of client and your Local File system. filesystem.
Formatting any NameNode with already existing namespaces could result in data loss. Format the active NameNode by specifying the Cluster ID. The Cluster ID must be the same as that of the existing namespaces. Bootstrap the standby NameNode as specified.
When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.
Run the command % $HADOOP_INSTALL/hadoop/bin/start-dfs.sh on the node you want the Namenode to run on. This will bring up HDFS with the Namenode running on the machine you ran the command on and Datanodes on the machines listed in the slaves file mentioned above.
hadoop namenode -format
this command deletes all files in your hdfs.
tmp directory contains two folders datanode, namenode in local filesystem. if you format the namenode these two folders becomes empty.
Note : if you want to format your namenode first stop all hadoop services then delete the tmp(contains namenode and datanode) folder in your local file system and start hadoop service surely it will take effect.
Reason for Hadoop namenode -format :
Hadoop NameNode is the centralized place of an HDFS file system which keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. In short, it keeps the metadata related to datanodes. When we format namenode it formats the meta-data related to data-nodes. By doing that, all the information on the datanodes are lost and they can be reused for new data.
By default the namenode location will be at "/tmp/hadoop-myuser/dfs/name"
While you formatting the namenode, this file location was cleared.
To change the namenode location add the follwing properties At hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>file:/search/data/dfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/search/data/dfs/datanode</value> </property>
I hope this will help you.. :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With