I would like to know is there any command/expression to get only the file name in hadoop. I need to fetch only the name of file, when I do hadoop fs -ls
it prints the whole path.
I tried below but just wondering if some better way to do it.
hadoop fs -ls <HDFS_DIR>|cut -d ' ' -f17
Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .
Usage: hadoop fs -ls [-d] [-h] [-R] [-t] [-S] [-r] [-u] <args> Options: -d: Directories are listed as plain files. -h: Format file sizes in a human-readable fashion (eg 64.0m instead of 67108864). -R: Recursively list subdirectories encountered. -t: Sort output by modification time (most recent first).
The ls command in Hadoop shows the list of files/contents in a specified directory, i.e., path. On adding “R” before /path, the output will show details of the content, such as names, size, owner, and so on for each file specified in the given directory.
Use -R followed by ls command to list files/directorires recursively. -d : Directories are listed as plain files. -h "Formats the sizes of files in a human-readable fashion rather than a number of bytes. -R "Recursively list the contents of directories.
The following command will return filenames only:
hdfs dfs -stat "%n" my/path/*
:added at Feb 04 '21
Actually last few years I use
hdfs dfs -ls -d my/path/* | awk '{print $8}'
and
hdfs dfs -ls my/path | grep -e "^-" | awk '{print $8}'
It seems hadoop ls does not support any options to output just the filenames, or even just the last column.
If you want get the last column reliably, you should first convert the whitespace to a single space, so that you can then address the last column:
hadoop fs -ls | sed '1d;s/ */ /g' | cut -d\ -f8
This will get you just the last column but files with the whole path. If you want just filenames, you can use basename as @rojomoke suggests:
hadoop fs -ls | sed '1d;s/ */ /g' | cut -d\ -f8 | xargs -n 1 basename
I also filtered out the first line that says Found ?x items
Note: beware that, as @felix-frank notes in the comments, that the above command will not correctly preserve file names with multiple consecutive spaces. Hence a more correct solution proposed by Felix:
hadoop fs -ls /tmp | sed 1d | perl -wlne'print +(split " ",$_,8)[7]'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With