I'm using Cloudera Hadoop. I'm able to run simple mapreduce program where I provide a file as input to MapReduce program. This file contains all the other files to be processed by mapper function. But, I'm stuck at one point. <pre class="prettyprint"><code>/folder1 - file1.txt - file2.txt - file3.txt </code></pre> How can I specify the input path to MapReduce program as <code>"/folder1"</code>, so that it can start processing each file inside that directory ? Any ideas ? EDIT : 1) Intiailly, I provided the inputFile.txt as input to mapreduce program. It was working perfectly. <pre class="prettyprint"><code>>inputFile.txt file1.txt file2.txt file3.txt </code></pre> 2) But now, instead of giving an input file, I want to provide with an input directory as arg[0] on command line. <pre class="prettyprint"><code>hadoop jar ABC.jar /folder1 /output </code></pre>

The Problem is FileInputFormat doesn't read files recursively in the input path dir. Solution: Use Following code <code>FileInputFormat.setInputDirRecursive(job, true);</code> Before below line in your Map Reduce Code <code>FileInputFormat.addInputPath(job, new Path(args[0]));</code> You can check here for which version it was fixed.

Hadoop : Provide directory as input to MapReduce job

Tags:

java

input

hadoop

mapreduce

cloudera

I'm using Cloudera Hadoop. I'm able to run simple mapreduce program where I provide a file as input to MapReduce program.

This file contains all the other files to be processed by mapper function.

But, I'm stuck at one point.

/folder1
  - file1.txt
  - file2.txt
  - file3.txt

How can I specify the input path to MapReduce program as "/folder1", so that it can start processing each file inside that directory ?

Any ideas ?

EDIT :

1) Intiailly, I provided the inputFile.txt as input to mapreduce program. It was working perfectly.

>inputFile.txt
file1.txt
file2.txt
file3.txt

2) But now, instead of giving an input file, I want to provide with an input directory as arg[0] on command line.

hadoop jar ABC.jar /folder1 /output

267

asked Nov 20 '13 11:11

Saurabh Gokhale

1 Answers

The Problem is FileInputFormat doesn't read files recursively in the input path dir.

Solution: Use Following code

FileInputFormat.setInputDirRecursive(job, true); Before below line in your Map Reduce Code

FileInputFormat.addInputPath(job, new Path(args[0]));

You can check here for which version it was fixed.

147

answered Nov 03 '22 22:11

shashaDenovo

Related questions
                            
                                What is the exact purpose of calling System.exit() in java
                            
                                Write a file to a remote location using Java with network path or drive?
                            
                                Read directory inside JAR with InputStreamReader
                            
                                Using Joda-Time to form correct ISODate for Mongo insert
                            
                                Deserialise a generic list in Gson
                            
                                write to xlsm (Excel 2007) using apache poi
                            
                                Could not find class 'android.support.v7.widget.SearchView$5'
                            
                                Converting Java to objective C [closed]
                            
                                Why is the implementation of Object.equals() not using hashCode()?
                            
                                JavaFX-2: how to get window size if it wasn't set manually?
                            
                                Java program to connect to Sql Server and running the sample query From Eclipse
                            
                                Which is the best approach to send large UDP packets in sequence
                            
                                Verifying success for spring JDBC batch update
                            
                                How to ignore unit test when condition meets?
                            
                                WebServlet cannot be resolved to a type [duplicate]
                            
                                Check if last getter in method chain is not null
                            
                                Is there an equivalent of CDI's @Default qualifier in Spring?
                            
                                How to correctly create an ArrayList of ArrayLists?
                            
                                Check which class has called function
                            
                                How to reuse saved classifier created from explorer(in weka) in eclipse java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With