Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get Folder size of HDFS from java

Tags:

hdfs

I have to HDFS folder size which is having sub directories from java.

From command line we can use -dus option, But anyone can help me on how to get the same using java.

like image 283
user1442237 Avatar asked May 16 '13 07:05

user1442237


People also ask

How can I get HDFS folder size?

2 Answers. You can use the “hadoop fs -ls command”. This command displays the list of files in the current directory and all it's details.In the output of this command, the 5th column displays the size of file in bytes.

How do I view folders in HDFS?

If you type hdfs dfs -ls / you will get list of directories in hdfs.

What is the size of HDFS?

The default size of the HDFS data block is 128 MB. If blocks are small, there will be too many blocks in Hadoop HDFS and thus too much metadata to store. Managing such a huge number of blocks and metadata will create overhead and lead to traffic in a network.


3 Answers

The getSpaceConsumed() function in the ContentSummary class will return the actual space the file/directory occupies in the cluster i.e. it takes into account the replication factor set for the cluster.

For instance, if the replication factor in the hadoop cluster is set to 3 and the directory size is 1.5GB, the getSpaceConsumed() function will return the value as 4.5GB.

Using getLength() function in the ContentSummary class will return you the actual file/directory size.

like image 181
Nikhil Menon Avatar answered Nov 05 '22 14:11

Nikhil Menon


You could use getContentSummary(Path f) method provided by the class FileSystem. It returns a ContentSummary object on which the getSpaceConsumed() method can be called which will give you the size of directory in bytes.

Usage :

package org.myorg.hdfsdemo;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class GetDirSize {

    /**
     * @param args
     * @throws IOException 
     */
    public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration config = new Configuration();
        config.addResource(new Path(
                "/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        config.addResource(new Path(
                "/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        FileSystem fs = FileSystem.get(config);
        Path filenamePath = new Path("/inputdir");
        System.out.println("SIZE OF THE HDFS DIRECTORY : " + fs.getContentSummary(filenamePath).getSpaceConsumed());
    }

}

HTH

like image 27
Tariq Avatar answered Nov 05 '22 14:11

Tariq


Thank you guys.

Scala version

package com.beloblotskiy.hdfsstats.model.hdfs

import java.nio.file.{Files => NioFiles, Paths => NioPaths}
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.commons.io.IOUtils
import java.nio.file.{Files => NioFiles}
import java.nio.file.{Paths => NioPaths}
import com.beloblotskiy.hdfsstats.common.Settings

/**
 * HDFS utilities
 * @author v-abelablotski
 */
object HdfsOps {
  private val conf = new Configuration()
  conf.addResource(new Path(Settings.pathToCoreSiteXml))
  conf.addResource(new Path(Settings.pathToHdfsSiteXml))
  private val fs = FileSystem.get(conf)

  /**
   * Calculates disk usage with replication factor.
   * If function returns 3G for folder with replication factor = 3, it means HDFS has 1G total files size multiplied by 3 copies space usage.
   */
  def duWithReplication(path: String): Long = {
    val fsPath = new Path(path);
    fs.getContentSummary(fsPath).getSpaceConsumed()
  }

  /**
   * Calculates disk usage without pay attention to replication factor.
   * Result will be the same with hadopp fs -du /hdfs/path/to/directory 
   */
  def du(path: String): Long = {
    val fsPath = new Path(path);
    fs.getContentSummary(fsPath).getLength()
  }

  //...
}
like image 25
beloblotskiy Avatar answered Nov 05 '22 15:11

beloblotskiy