Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can I increase hdfs capacity

Tags:

hadoop

hdfs

How can I increase the configured capacity of my hadoop DFS from the default 50GB to 100GB?

My present setup is hadoop 1.2.1 running on a centOS6 machine with 120GB of 450GB used. Have set up hadoop to be in psudodistributed mode with the /conf suggested by "Hadoop the Definitive Guide 3'rd). hdfs-site.xml had only one configured property:

   <configuration>
    <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
 </configuration>

The following line gave no error feedback... comes back to the prompt.

hadoop dfsadmin -setSpaceQuota 100g  /tmp/hadoop-myUserID

If I am in a regen loop (have executed

 rm -rf /tmp/hadoop-myUserId  

in a attempt to "start from scratch") This seeming success of the setSpaceQuota occurs iff-and-only-if I have executed

  start-all.sh
  hadoop namenode -format

The failure of my dfs capacity configuration is shown by

 hadoop dfsadmin -report

which shows the same 50GB of configured capacity.

I would be willing to switch over to hadoop 2.2 (now stable release) if that is the current best way to get 100GB hdfs configured capacity. Seems like there should be a configuration property for hdfs-site.xml which would allow me to use more of my free partition.

like image 245
teserecter Avatar asked Oct 23 '13 12:10

teserecter


People also ask

How to increase hdfs capacity?

4) The straight forward way to increase the DFS capacity is mention additional folder in the "DataNode directories" field under HDFS -> Configs -> Settings tab, as a comma separated value. This new folder should exist in a file system that has more disk capacity.

How is Hdfs size calculated?

Use the hdfs du command to get the size of a directory in HDFS. -x to exclude snapshots from the result. Snapshots are read only, point in time copies of a folder structure in HDFS. Usually used by Hadoop admins to preserve a copy of the files and folders at a point in time.

What is the storage capacity of your Hadoop cluster?

Hadoop clusters rely on massively parallel IO capacity to support thousands of concurrent tasks. Given a datanode of size 96TB, let us consider two disk sizes – 8TB and 16TB. A datanode with 8TB disk would have 12 such disks whereas one with 16TB disk would have 6.

What is disk in Hadoop?

Disk Balancer is a command-line tool introduced in Hadoop 3 for balancing the disks within the DataNode. HDFS diskbalancer is different from the HDFS Balancer, which balances the distribution across the nodes.


1 Answers

Set the location of the hdfs to a partition with more free space. For hadoop-1.2.1 this can be done by setting the hadoop.tmp.dir in hadoop-1.2.1/conf/core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
   <property>
      <name>fs.default.name</name>
     <value>hdfs://localhost:9000</value>
     </property>
   <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/myUserID/hdfs</value>
    <description>base location for other hdfs directories.</description>
   </property>
</configuration>

Running

df

had said my _home partition was my hard disk, minus 50GB for my /
( _root) partition. The default location for hdfs is /tmp/hadoop-myUserId which is in the / partition. This is where my initial 50GB hdfs size came from.

Creation and confirmation of the partition location of a directory for the hdfs was accomplished by

mkdir ~/hdfs
df -P ~/hdfs | tail -1 | cut -d' ' -f 1

successful implementation was accomplished by

stop-all.sh
start-dfs.sh
hadoop namenode -format
start-all.sh
hadoop dfsadmin -report

which reports the size of the hdfs as the size of my _home partition.

Thank you jtravaglini for the comment/clue.

like image 140
teserecter Avatar answered Sep 21 '22 20:09

teserecter