How can I increase the configured capacity of my hadoop DFS from the default 50GB to 100GB?
My present setup is hadoop 1.2.1 running on a centOS6 machine with 120GB of 450GB used. Have set up hadoop to be in psudodistributed mode with the /conf suggested by "Hadoop the Definitive Guide 3'rd). hdfs-site.xml had only one configured property:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
The following line gave no error feedback... comes back to the prompt.
hadoop dfsadmin -setSpaceQuota 100g /tmp/hadoop-myUserID
If I am in a regen loop (have executed
rm -rf /tmp/hadoop-myUserId
in a attempt to "start from scratch") This seeming success of the setSpaceQuota occurs iff-and-only-if I have executed
start-all.sh
hadoop namenode -format
The failure of my dfs capacity configuration is shown by
hadoop dfsadmin -report
which shows the same 50GB of configured capacity.
I would be willing to switch over to hadoop 2.2 (now stable release) if that is the current best way to get 100GB hdfs configured capacity. Seems like there should be a configuration property for hdfs-site.xml which would allow me to use more of my free partition.
4) The straight forward way to increase the DFS capacity is mention additional folder in the "DataNode directories" field under HDFS -> Configs -> Settings tab, as a comma separated value. This new folder should exist in a file system that has more disk capacity.
Use the hdfs du command to get the size of a directory in HDFS. -x to exclude snapshots from the result. Snapshots are read only, point in time copies of a folder structure in HDFS. Usually used by Hadoop admins to preserve a copy of the files and folders at a point in time.
Hadoop clusters rely on massively parallel IO capacity to support thousands of concurrent tasks. Given a datanode of size 96TB, let us consider two disk sizes – 8TB and 16TB. A datanode with 8TB disk would have 12 such disks whereas one with 16TB disk would have 6.
Disk Balancer is a command-line tool introduced in Hadoop 3 for balancing the disks within the DataNode. HDFS diskbalancer is different from the HDFS Balancer, which balances the distribution across the nodes.
Set the location of the hdfs to a partition with more free space. For hadoop-1.2.1 this can be done by setting the hadoop.tmp.dir in hadoop-1.2.1/conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/myUserID/hdfs</value>
<description>base location for other hdfs directories.</description>
</property>
</configuration>
Running
df
had said my _home partition was my hard disk, minus 50GB for my /
( _root) partition. The default location for hdfs is
/tmp/hadoop-myUserId
which is in the / partition. This is where my initial 50GB hdfs size came from.
Creation and confirmation of the partition location of a directory for the hdfs was accomplished by
mkdir ~/hdfs
df -P ~/hdfs | tail -1 | cut -d' ' -f 1
successful implementation was accomplished by
stop-all.sh
start-dfs.sh
hadoop namenode -format
start-all.sh
hadoop dfsadmin -report
which reports the size of the hdfs as the size of my _home partition.
Thank you jtravaglini for the comment/clue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With