Why is there no 'hadoop fs -head' shell command?

Tags:

hdfs

A fast method for inspecting files on HDFS is to use tail:

~$ hadoop fs -tail /path/to/file

This displays the last kilobyte of data in the file, which is extremely helpful. However, the opposite command head does not appear to be part of the shell command collections. I find this very surprising.

My hypothesis is that since HDFS is built for very fast streaming reads on very large files, there is some access-oriented issue that affects head. This makes me hesitant to do things to access the head. Does anyone have an answer?

676

asked Nov 04 '13 22:11

bbengfort

1 Answers

I would say it's more to do with efficiency - a head can easily be replicated by piping the output of a hadoop fs -cat through the linux head command.

hadoop fs -cat /path/to/file | head

This is efficient as head will close out the underlying stream after the desired number of lines have been output

Using tail in this manner would be considerably less efficient - as you'd have to stream over the entire file (all HDFS blocks) to find the final x number of lines.

hadoop fs -cat /path/to/file | tail

The hadoop fs -tail command as you note works on the last kilobyte - hadoop can efficiently find the last block and skip to the position of the final kilobyte, then stream the output. Piping via tail can't easily do this.

118

answered Sep 20 '22 17:09

Chris White

Related questions
                            
                                Does Hive have a String split function?
                            
                                Namenode not getting started
                            
                                Hbase quickly count number of rows
                            
                                Scalable Image Storage
                            
                                Difference between hadoop fs -put and hadoop fs -copyFromLocal
                            
                                PIG how to count a number of rows in alias
                            
                                How does Hive compare to HBase?
                            
                                How does impala provide faster query response compared to hive
                            
                                Hadoop on OSX "Unable to load realm info from SCDynamicStore"
                            
                                how to kill hadoop jobs
                            
                                Write a file in hdfs with Java
                            
                                How to check Spark Version [closed]
                            
                                Life without JOINs... understanding, and common practices
                            
                                Stop Java Coffee Cup icon from appearing in the Dock on Mac OSX
                            
                                How to access s3a:// files from Apache Spark?
                            
                                Hadoop cluster setup - java.net.ConnectException: Connection refused
                            
                                out of Memory Error in Hadoop
                            
                                HDFS free space available command
                            
                                How to fix corrupt HDFS FIles
                            
                                Hive cluster by vs order by vs sort by

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With