I'm new to map-reduce framework. I want to find out the number of files under a specific directory by providing the name of that directory. e.g. Suppose we have 3 directories A, B, C and each one is having 20, 30, 40 part-r files respectively. So I'm interested in writing a hadoop job, which will count files/records in each directory i.e I want an output in below formatted .txt file:
A is having 20 records
B is having 30 records
C is having 40 records
These all directories are present in HDFS.
The Hadoop fs shell command count counts the number of files, directories, and bytes under the paths that matches the specified file pattern. Options: -q – shows quotas(quota is the hard limit on the number of names and amount of space used for individual directories) -u – it limits output to show quotas and usage only.
You can use the “hadoop fs -ls command”. This command displays the list of files in the current directory and all it's details.In the output of this command, the 5th column displays the size of file in bytes.
Hadoop ls Command The ls command in Hadoop shows the list of files/contents in a specified directory, i.e., path. On adding “R” before /path, the output will show details of the content, such as names, size, owner, and so on for each file specified in the given directory.
You can use hadoop fs -ls command to list files in the current directory as well as their details. The 5th column in the command output contains file size in bytes.
The simplest/native approach is to use built in hdfs commands, in this case -count
:
hdfs dfs -count /path/to/your/dir >> output.txt
Or if you prefer a mixed approach via Linux commands:
hadoop fs -ls /path/to/your/dir/* | wc -l >> output.txt
Finally the MapReduce version has already been answered here:
How do I count the number of files in HDFS from an MR job?
Code:
int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
count++;
ri.next();
}
System.out.println("The count is: " + count);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With