Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get a few lines of HDFS data

Tags:

I am having a 2 GB data in my HDFS.

Is it possible to get that data randomly. Like we do in the Unix command line

cat iris2.csv |head -n 50 
like image 917
Unmesha Sreeveni U.B Avatar asked Feb 28 '14 09:02

Unmesha Sreeveni U.B


People also ask

How do I list files in HDFS?

Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .


2 Answers

Native head

hadoop fs -cat /your/file | head 

is efficient here, as cat will close the stream as soon as head will finish reading all the lines.

To get the tail there is a special effective command in hadoop:

hadoop fs -tail /your/file 

Unfortunately it returns last kilobyte of the data, not a given number of lines.

like image 147
Viacheslav Rodionov Avatar answered Oct 05 '22 11:10

Viacheslav Rodionov


You can use head command in Hadoop too! Syntax would be

hdfs dfs -cat <hdfs_filename> | head -n 3 

This will print only three lines from the file.

like image 45
Piyush Patel Avatar answered Oct 05 '22 09:10

Piyush Patel