Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search/Find a file and file content in Hadoop

Tags:

I am currently working on a project using Hadoop DFS.

  1. I notice there is no search or find command in Hadoop Shell. Is there a way to search and find a file (e.g. testfile.doc) in Hadoop DFS?

  2. Does Hadoop support file content search? If so, how to do it? For example, I have many Word Doc files stored in HDFS, I want to list which files have the words "computer science" in them.

What about in other Distributed File Systems? Is file content search a soft spot of distributed file systems?

like image 761
leon Avatar asked Jun 09 '11 18:06

leon


1 Answers

  1. You can do this: hdfs dfs -ls -R / | grep [search_term].
  2. It sounds like a MapReduce job might be suitable here. Here's something similar, but for text files. However, if these documents are small, you may run into inefficiencies. Basically, each file will be assigned to one map task. If the files are small, the overhead to set up the map task may be significant compared to the time necessary to process the file.
like image 81
ajduff574 Avatar answered Oct 02 '22 23:10

ajduff574