I'm chaining multiple MapReduce jobs and want to pass along/store some meta information (e.g. configuration or name of original input) with the results. At least the file "_SUCCESS" and also anything in the directory "_logs" seams to be ignored.
Are there any filename patterns which are by default ignored by the InputReader
? Or is this just a fixed limited list?
The FileInputFormat
uses the following hiddenFileFilter by default:
private static final PathFilter hiddenFileFilter = new PathFilter(){
public boolean accept(Path p){
String name = p.getName();
return !name.startsWith("_") && !name.startsWith(".");
}
};
So if you uses any FileInputFormat
(such as TextInputFormat
, KeyValueTextInputFormat
, SequenceFileInputFormat
), the hidden files (the file name starts with "_" or ".") will be ignored.
You can use FileInputFormat.setInputPathFilter to set your custom PathFilter
. Remember that the hiddenFileFilter
is always active.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With