Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which files are ignored as input by mapper?

I'm chaining multiple MapReduce jobs and want to pass along/store some meta information (e.g. configuration or name of original input) with the results. At least the file "_SUCCESS" and also anything in the directory "_logs" seams to be ignored.

Are there any filename patterns which are by default ignored by the InputReader? Or is this just a fixed limited list?

like image 740
Mario L Avatar asked Nov 07 '13 07:11

Mario L


1 Answers

The FileInputFormat uses the following hiddenFileFilter by default:

  private static final PathFilter hiddenFileFilter = new PathFilter(){
      public boolean accept(Path p){
        String name = p.getName(); 
        return !name.startsWith("_") && !name.startsWith("."); 
      }
    }; 

So if you uses any FileInputFormat (such as TextInputFormat, KeyValueTextInputFormat, SequenceFileInputFormat), the hidden files (the file name starts with "_" or ".") will be ignored.

You can use FileInputFormat.setInputPathFilter to set your custom PathFilter. Remember that the hiddenFileFilter is always active.

like image 100
zsxwing Avatar answered Dec 01 '22 01:12

zsxwing