How recursively use a directory structure in the new Hadoop API?

Question

My file structure is the following:

/indir/somedir1/somefile
/indir/somedir1/someotherfile...
/indir/somedir2/somefile
/indir/somedir2/someotherfile...

I now want to pass everything recursively into a MR job, and I am using the new API. So I did:

FileInputFormat.setInputPaths(job, new Path("/indir"));

But the job fails with:

Error: java.io.FileNotFoundException: Path is not a file: /indir/somedir1

I am using Hadoop 2.4 and in this post it is stated that Hadoop 2's new API does not support recursive files. But I am wondering how this can be, as I think it is the most ordinary thing in the world to throw a large nested directory structure at a Hadoop job...

So, is this intended, or is this a bug? In both ways, is there another workaround than using the old API?

rabejens · Accepted Answer

I found the answer myself. In the JIRA linked in the mentioned forum post, there are two comments on how it is done right:

Set mapreduce.input.fileinputformat.input.dir.recursive to true (comment states mapred.input.dir.recursive but that is deprecated)
Use FileInputFormat.addInputPath to specify the input directory

With these changes, it works.

shapiy · Answer

Another way to configure it is via FileInputFormat class.

FileInputFormat.setInputDirRecursive(job, true);

How recursively use a directory structure in the new Hadoop API?

Tags:

recursion

hadoop

hdfs

rabejens

2 Answers

rabejens

shapiy

Recent Activity

Donate For Us

How recursively use a directory structure in the new Hadoop API?

Tags:

recursion

hadoop

hdfs

rabejens

2 Answers

rabejens

shapiy

Related questions

Recent Activity

Donate For Us