I'm chaining multiple MapReduce jobs and want to pass along/store some meta information (e.g. configuration or name of original input) with the results. At least the file "_SUCCESS" and also anything in the directory "_logs" seams to be ignored.
Are there any filename patterns which are by default ignored by the InputReader? Or is this just a fixed limited list?
The FileInputFormat uses the following hiddenFileFilter by default:
private static final PathFilter hiddenFileFilter = new PathFilter(){
public boolean accept(Path p){
String name = p.getName();
return !name.startsWith("_") && !name.startsWith(".");
}
};
So if you uses any FileInputFormat (such as TextInputFormat, KeyValueTextInputFormat, SequenceFileInputFormat), the hidden files (the file name starts with "_" or ".") will be ignored.
You can use FileInputFormat.setInputPathFilter to set your custom PathFilter. Remember that the hiddenFileFilter is always active.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With