In Java code, I want to connect to a directory in HDFS, learn the number of files in that directory, get their names and want to read them. I can already read the files but I couldn't figure out how to count files in a directory and get file names like an ordinary directory.
In order to read I use DFSClient and open files into InputStream.
Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .
If the -skipTrash option is specified, the trash, if enabled, will be bypassed and the specified file(s) deleted immediately. This can be useful when it is necessary to delete files from an over-quota directory. Example: hdfs dfs -rmr /user/hadoop/dir.
You can use hadoop fs -ls command to list files in the current directory as well as their details. The 5th column in the command output contains file size in bytes.
count
Usage: hadoop fs -count [-q] <paths>
Count the number of directories, files and bytes under the paths that match the specified file pattern. The output columns are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME.
The output columns with -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME.
Example:
hadoop fs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2 hadoop fs -count -q hdfs://nn1.example.com/file1
Exit Code:
Returns 0 on success and -1 on error.
You can just use the FileSystem and iterate over the files inside the path. Here is some example code
int count = 0; FileSystem fs = FileSystem.get(getConf()); boolean recursive = false; RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive); while (ri.hasNext()){ count++; ri.next(); }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With