Get Folder size of HDFS from java



I have to HDFS folder size which is having sub directories from java.

From command line we can use -dus option, But anyone can help me on how to get the same using java.

3 Answers

The getSpaceConsumed() function in the ContentSummary class will return the actual space the file/directory occupies in the cluster i.e. it takes into account the replication factor set for the cluster.

For instance, if the replication factor in the hadoop cluster is set to 3 and the directory size is 1.5GB, the getSpaceConsumed() function will return the value as 4.5GB.

Using getLength() function in the ContentSummary class will return you the actual file/directory size.

Nikhil Menon

You could use getContentSummary(Path f) method provided by the class FileSystem. It returns a ContentSummary object on which the getSpaceConsumed() method can be called which will give you the size of directory in bytes.

Usage :

package org.myorg.hdfsdemo;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class GetDirSize {

     * @param args
     * @throws IOException 
    public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration config = new Configuration();
        config.addResource(new Path(
        config.addResource(new Path(
        FileSystem fs = FileSystem.get(config);
        Path filenamePath = new Path("/inputdir");
        System.out.println("SIZE OF THE HDFS DIRECTORY : " + fs.getContentSummary(filenamePath).getSpaceConsumed());



Scala version

Scala version

package com.beloblotskiy.hdfsstats.model.hdfs

import java.nio.file.{Files => NioFiles, Paths => NioPaths}
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.commons.io.IOUtils
import java.nio.file.{Files => NioFiles}
import java.nio.file.{Paths => NioPaths}
import com.beloblotskiy.hdfsstats.common.Settings

 * HDFS utilities
 * @author v-abelablotski
object HdfsOps {
  private val conf = new Configuration()
  conf.addResource(new Path(Settings.pathToCoreSiteXml))
  conf.addResource(new Path(Settings.pathToHdfsSiteXml))
  private val fs = FileSystem.get(conf)

   * Calculates disk usage with replication factor.
   * If function returns 3G for folder with replication factor = 3, it means HDFS has 1G total files size multiplied by 3 copies space usage.
  def duWithReplication(path: String): Long = {
    val fsPath = new Path(path);

   * Calculates disk usage without pay attention to replication factor.
   * Result will be the same with hadopp fs -du /hdfs/path/to/directory 
  def du(path: String): Long = {
    val fsPath = new Path(path);

