I have to HDFS folder size which is having sub directories from java.
From command line we can use -dus option, But anyone can help me on how to get the same using java.
2 Answers. You can use the “hadoop fs -ls command”. This command displays the list of files in the current directory and all it's details.In the output of this command, the 5th column displays the size of file in bytes.
If you type hdfs dfs -ls / you will get list of directories in hdfs.
The default size of the HDFS data block is 128 MB. If blocks are small, there will be too many blocks in Hadoop HDFS and thus too much metadata to store. Managing such a huge number of blocks and metadata will create overhead and lead to traffic in a network.
The getSpaceConsumed()
function in the ContentSummary
class will return the actual space the file/directory occupies in the cluster i.e. it takes into account the replication factor set for the cluster.
For instance, if the replication factor in the hadoop cluster is set to 3 and the directory size is 1.5GB, the getSpaceConsumed()
function will return the value as 4.5GB.
Using getLength()
function in the ContentSummary
class will return you the actual file/directory size.
You could use getContentSummary(Path f)
method provided by the class FileSystem
. It returns a ContentSummary
object on which the getSpaceConsumed()
method can be called which will give you the size of directory in bytes.
Usage :
package org.myorg.hdfsdemo;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class GetDirSize {
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
Configuration config = new Configuration();
config.addResource(new Path(
"/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
config.addResource(new Path(
"/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
FileSystem fs = FileSystem.get(config);
Path filenamePath = new Path("/inputdir");
System.out.println("SIZE OF THE HDFS DIRECTORY : " + fs.getContentSummary(filenamePath).getSpaceConsumed());
}
}
HTH
Thank you guys.
Scala version
package com.beloblotskiy.hdfsstats.model.hdfs
import java.nio.file.{Files => NioFiles, Paths => NioPaths}
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.commons.io.IOUtils
import java.nio.file.{Files => NioFiles}
import java.nio.file.{Paths => NioPaths}
import com.beloblotskiy.hdfsstats.common.Settings
/**
* HDFS utilities
* @author v-abelablotski
*/
object HdfsOps {
private val conf = new Configuration()
conf.addResource(new Path(Settings.pathToCoreSiteXml))
conf.addResource(new Path(Settings.pathToHdfsSiteXml))
private val fs = FileSystem.get(conf)
/**
* Calculates disk usage with replication factor.
* If function returns 3G for folder with replication factor = 3, it means HDFS has 1G total files size multiplied by 3 copies space usage.
*/
def duWithReplication(path: String): Long = {
val fsPath = new Path(path);
fs.getContentSummary(fsPath).getSpaceConsumed()
}
/**
* Calculates disk usage without pay attention to replication factor.
* Result will be the same with hadopp fs -du /hdfs/path/to/directory
*/
def du(path: String): Long = {
val fsPath = new Path(path);
fs.getContentSummary(fsPath).getLength()
}
//...
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With