Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hdfs-site.xml for adding a new datanode

Tags:

hadoop

I have installed hadoop 2.7.2 in pseudo-distributed mode(machine-1).I want to add a new datanode to it to make it as a cluster.As, but the problem is both of the machine has differnet disk partitions.

I installed same version hadoop 2.7.2 in new data node(machine-2) and also can ssh with machine-1.After googling many websites, all have common tutorials mentioning that, we have to have the same configurations files inside /etc/hadoop/ folder.

With the above said, my existing configurations in machine-1 are:

core-site.xml

    <configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home1/tmp</value>
                <description>A base for other temporary directories

    <property>
                <name>fs.default.name</name>
                <value>hdfs://CP000187:9000</value>
    </property>

    <property>
        <name>hadoop.proxyuser.vasanth.hosts</name>
        <value>*</value>
     </property>

    <property>
        <name>hadoop.proxyuser.vasanth.groups</name>
        <value>*</value>
    </property>
    </configuration>

hdfs-site.xml:

<configuration>
     <property>
            <name>dfs.replication</name>
            <value>1</value>
     </property>
     <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/home1/hadoop_data/hdfs/namenode</value>
     </property>
     <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/home1/hadoop_store/hdfs/datanode</value>
     </property>
     <property>
            <name>dfs.permissions</name>
            <value>false</value>
     </property>
</configuration>

/home1 is a disk mounted in machine1.

Machine-2 has two disk mounted namely /hdd1 and /hdd2.

Now, what should i specify in hdfs-site.xml on the new machine(machine-2) to make use of both hdd1 and hdd2?

should the value of dfs.data.dir of all nodes needs to be same?

Is the dfs.namenode.name.dir property required on hdfs-site.xml on machine2(since it is not a name node)?

My simplified question is it mandatory to replicate the master node configuration files in slave nodes also? Please help me out on this..

like image 563
M.Prabhu Avatar asked Oct 27 '25 10:10

M.Prabhu


1 Answers

You just need to copy entire hadoop folder from node1 to node2 . So in both configuration should point hdfs://CP000187:9000 . You dont have to do any addition settings in node2 .

To start datanode in node2 run (From sbin) .You need run only datanode and nodemanager process in node2

./hadoop-daemon.sh start datanode

To check whether datanode is added correct or not , run dfsadmin -report in node1

hadoop dfsadmin -report 

Output :

Configured Capacity: 24929796096 (23.22 GB)
Present Capacity: 17852575744 (16.63 GB)
DFS Remaining: 17851076608 (16.63 GB)
DFS Used: 1499136 (1.43 MB)
DFS Used%: 0.01%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):
like image 179
sterin jacob Avatar answered Oct 30 '25 14:10

sterin jacob



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!