A fast method for inspecting files on HDFS is to use tail:
~$ hadoop fs -tail /path/to/file
This displays the last kilobyte of data in the file, which is extremely helpful. However, the opposite command head
does not appear to be part of the shell command collections. I find this very surprising.
My hypothesis is that since HDFS is built for very fast streaming reads on very large files, there is some access-oriented issue that affects head
. This makes me hesitant to do things to access the head. Does anyone have an answer?
Hadoop HDFS ls Command Description: The Hadoop fs shell command ls displays a list of the contents of a directory specified in the path provided by the user. It shows the name, permissions, owner, size, and modification date for each file or directories in the specified directory.
1 Answer. There IS a difference between the two, refer to the following figure from Apache's official documentation: As we can see here, the 'hdfs dfs' command is used very specifically for hadoop filesystem (hdfs) data operations while 'hadoop fs' covers a larger variety of data present on external platforms as well.
Access the HDFS using its web UI. Open your Browser and type localhost:50070 You can see the web UI of HDFS move to utilities tab which is on the right side and click on Browse the File system, you can see the list of files which are in your HDFS.
I would say it's more to do with efficiency - a head can easily be replicated by piping the output of a hadoop fs -cat through the linux head command.
hadoop fs -cat /path/to/file | head
This is efficient as head will close out the underlying stream after the desired number of lines have been output
Using tail in this manner would be considerably less efficient - as you'd have to stream over the entire file (all HDFS blocks) to find the final x number of lines.
hadoop fs -cat /path/to/file | tail
The hadoop fs -tail command as you note works on the last kilobyte - hadoop can efficiently find the last block and skip to the position of the final kilobyte, then stream the output. Piping via tail can't easily do this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With