I have a file on HDFS that I want to know how many lines are. (testfile)
In linux, I can do:
wc -l <filename>
Can I do something similar with "hadoop fs" command? I can print file contents with:
hadoop fs -text /user/mklein/testfile
How do I know how many lines do I have? I want to avoid copying the file to local filesystem then running the wc command.
Note: My file is compressed using snappy compression, which is why I have to use -text instead of -cat
Count the number of directories, files and bytes under the paths that match the specified file pattern. Example: hdfs dfs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2. hdfs dfs -count -q hdfs://nn1.example.com/file1.
We can use hadoop file system check command to know the blocks for the specific file.
Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .
Total number of files: hadoop fs -ls /path/to/hdfs/* | wc -l
Total number of lines: hadoop fs -cat /path/to/hdfs/* | wc -l
Total number of lines for a given file: hadoop fs -cat /path/to/hdfs/filename | wc -l
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With