I know that from the terminal, one can do a find
command to find files such as :
find . -type d -name "*something*" -maxdepth 4
But, when I am in the hadoop file system, I have not found a way to do this.
hadoop fs -find ....
throws an error.
How do people traverse files in hadoop? I'm using hadoop 2.6.0-cdh5.4.1
.
Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location.
Using the ls command, we can check for the directories in HDFS. Hadoop HDFS mkdir Command Description: This command creates the directory in HDFS if it does not already exist.
checksum property, which defaults to 512 bytes. The chunk size is stored as metadata in the . crc file, so the file can be read back correctly even if the setting for the chunk size has changed. Checksums are verified when the file is read, and if an error is detected, LocalFileSystem throws a ChecksumException .
hadoop fs -find
was introduced in Apache Hadoop 2.7.0. Most likely you're using an older version hence you don't have it yet.
see: HADOOP-8989 for more information.
In the meantime you can use
hdfs dfs -ls -R <pattern>
e.g,: hdfs dfs -ls -R /demo/order*.*
but that's not as powerful as 'find' of course and lacks some basics. From what I understand people have been writing scripts around it to get over this problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With