Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there the equivalent for a `find` command in `hadoop`?

I know that from the terminal, one can do a find command to find files such as :

find . -type d -name "*something*" -maxdepth 4 

But, when I am in the hadoop file system, I have not found a way to do this.

hadoop fs -find ....

throws an error.

How do people traverse files in hadoop? I'm using hadoop 2.6.0-cdh5.4.1.

like image 334
makansij Avatar asked Oct 01 '15 20:10

makansij


People also ask

How do I search for files in hadoop?

Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location.

How do I find a directory in hadoop?

Using the ls command, we can check for the directories in HDFS. Hadoop HDFS mkdir Command Description: This command creates the directory in HDFS if it does not already exist.

What is checksum command in hadoop?

checksum property, which defaults to 512 bytes. The chunk size is stored as metadata in the . crc file, so the file can be read back correctly even if the setting for the chunk size has changed. Checksums are verified when the file is read, and if an error is detected, LocalFileSystem throws a ChecksumException .


1 Answers

hadoop fs -find was introduced in Apache Hadoop 2.7.0. Most likely you're using an older version hence you don't have it yet. see: HADOOP-8989 for more information.

In the meantime you can use

hdfs dfs -ls -R <pattern>

e.g,: hdfs dfs -ls -R /demo/order*.*

but that's not as powerful as 'find' of course and lacks some basics. From what I understand people have been writing scripts around it to get over this problem.

like image 116
Legato Avatar answered Sep 19 '22 01:09

Legato