Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File count in an HDFS directory

Tags:

In Java code, I want to connect to a directory in HDFS, learn the number of files in that directory, get their names and want to read them. I can already read the files but I couldn't figure out how to count files in a directory and get file names like an ordinary directory.

In order to read I use DFSClient and open files into InputStream.

like image 755
user1125953 Avatar asked Dec 04 '13 17:12

user1125953


People also ask

How do I list all files in HDFS?

Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .

What is skipTrash in hadoop?

If the -skipTrash option is specified, the trash, if enabled, will be bypassed and the specified file(s) deleted immediately. This can be useful when it is necessary to delete files from an over-quota directory. Example: hdfs dfs -rmr /user/hadoop/dir.

How do I list all files in HDFS and size?

You can use hadoop fs -ls command to list files in the current directory as well as their details. The 5th column in the command output contains file size in bytes.


1 Answers

count

Usage: hadoop fs -count [-q] <paths> 

Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME.

The output columns with -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME.

Example:

hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2 hadoop fs -count -q hdfs://nn1.example.com/file1 

Exit Code:

Returns 0 on success and -1 on error.

You can just use the FileSystem and iterate over the files inside the path. Here is some example code

int count = 0; FileSystem fs = FileSystem.get(getConf()); boolean recursive = false; RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive); while (ri.hasNext()){     count++;     ri.next(); } 
like image 167
user2486495 Avatar answered Sep 20 '22 14:09

user2486495