Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where is the configuration file for HDFS in Hadoop 2.2.0?

I'm studying Hadoop and currently I'm trying to set up an Hadoop 2.2.0 single node. I downloaded the latest distribution, uncompressed it, now I'm trying to set up the Hadoop Distributed File System (HDFS).

Now, I'm trying to follow the Hadoop instructions available here but I'm quite lost.

In the left bar you see there are references to the following files:

  • core-default.xml
  • hdfs-default.xml
  • mapred-default.xml
  • yarn-default.xml

But how those files are ?

I found /etc/hadoop/hdfs-site.xml, but it is empty!

I found /share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml but it is just a piece of doc!

So, what files I have to modify to configure HDFS ? Where the deaults values are read from ?

Thanks in advance for your help.

like image 699
danidemi Avatar asked Jan 26 '14 22:01

danidemi


People also ask

Does hadoop have a configuration file?

Hadoop configuration is driven by two types of important configuration files: Read-only default configuration - src/core/core-default. xml, src/hdfs/hdfs-default.

Which of the file contains the configuration setting for HDFS commands?

The hdfs-site. xml file contains the configuration settings for HDFS daemons; the NameNode, the Secondary NameNode, and the DataNodes.

Which configuration file is used for hadoop core config settings that are common to HDFS and MapReduce?

xml — This configuration file contains Hadoop core configuration settings, for example, I/O settings, very common for MapReduce and HDFS. It uses hostname a port.

How can I change hadoop configuration?

To change the default value: Edit the /etc/hadoop/conf/hadoop-env.sh file. Change the value of the -XX:MaxNewSize parameter to 1/8th the value of the maximum heap size ( -Xmx ) parameter. We also recommend that you set the value of -XX:NewSize to the same value as -XX:MaxNewSize .


2 Answers

These files are all found in the hadoop/conf directory.

For setting HDFS you have to configure core-site.xml and hdfs-site.xml.

HDFS works in two modes: distributed (multi-node cluster) and pseudo-distributed (cluster of one single machine).

For the pseudo-distributed mode you have to configure:

In core-site.xml:

<!-- namenode -->
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:8020</value>
</property>

In hdfs-site.xml:

<-- storage directories for HDFS - the hadoop.tmp.dir property, whose default is /tmp/hadoop-${user.name} -->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/your-dir/</value>
</property>

Each property has its hardcoded default value.

Please remember to set ssh password-less login for hadoop user before starting HDFS.

P.S.

It you download Hadoop from Apache, you can consider switching to a Hadoop distribution:

Cloudera's CDH, HortonWorks or MapR.

If you install Cloudera CDH or Hortonworks HDP you will find the files in /etc/hadoop/conf/.

like image 183
Evgeny Benediktov Avatar answered Oct 16 '22 07:10

Evgeny Benediktov


For Hortonworks location would be

/etc/hadoop/conf/hdfs-site.xml
like image 21
Indrajeet Gour Avatar answered Oct 16 '22 08:10

Indrajeet Gour