I am having a 2 GB
data in my HDFS
.
Is it possible to get that data randomly. Like we do in the Unix command line
cat iris2.csv |head -n 50
Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .
Native head
hadoop fs -cat /your/file | head
is efficient here, as cat will close the stream as soon as head will finish reading all the lines.
To get the tail there is a special effective command in hadoop:
hadoop fs -tail /your/file
Unfortunately it returns last kilobyte of the data, not a given number of lines.
You can use head command in Hadoop too! Syntax would be
hdfs dfs -cat <hdfs_filename> | head -n 3
This will print only three lines from the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With