Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count lines in a file on hdfs command?

Tags:

I have a file on HDFS that I want to know how many lines are. (testfile)

In linux, I can do:

wc -l <filename> 

Can I do something similar with "hadoop fs" command? I can print file contents with:

hadoop fs -text /user/mklein/testfile 

How do I know how many lines do I have? I want to avoid copying the file to local filesystem then running the wc command.

Note: My file is compressed using snappy compression, which is why I have to use -text instead of -cat

like image 244
Setsuna Avatar asked Sep 16 '15 15:09

Setsuna


People also ask

How can I count in hadoop?

Count the number of directories, files and bytes under the paths that match the specified file pattern. Example: hdfs dfs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2. hdfs dfs -count -q hdfs://nn1.example.com/file1.

How do I know how many blocks I have in HDFS?

We can use hadoop file system check command to know the blocks for the specific file.

How do I list all files in HDFS?

Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .


1 Answers

Total number of files: hadoop fs -ls /path/to/hdfs/* | wc -l

Total number of lines: hadoop fs -cat /path/to/hdfs/* | wc -l

Total number of lines for a given file: hadoop fs -cat /path/to/hdfs/filename | wc -l

like image 168
Soumick Dasgupta Avatar answered Sep 25 '22 01:09

Soumick Dasgupta