Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can hadoop take input from multiple directories and files

Tags:

input

hadoop

As I set the fileinputFormat as hadoop input. The arg[0]+"/*/*/*" said match no files.

what I want to is to read from multiple files as:

Directory1
---Directory11
   ---Directory111
        --f1.txt
        --f2.txt
---Directory12
Directory2
---Directory21

is it possible in Hadoop? Thanks!

like image 573
JudyJiang Avatar asked May 08 '13 16:05

JudyJiang


1 Answers

You can take input from multiple directories and files by using the ***** operator. Most likely it's because the "arg[0]" argument isn't correct and therefore it's not finding the files.

As an alternative, you can also use InputFormat.addInputPath or if you need separate formats or mappers the MultipleInputs class can be used.

Example of basic adding a path

FileInputFormat.addInputPath(job, myInputPath);

Here is an example of MultipleInputs

MultipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, MyMapper.class);
MultipleInputs.addInputPath(job, inputPath2, TextInputFormat.class, MyOtherMapper.class);

This other question is also very similar and has good answers, Hadoop to reduce from multiple input formats.

like image 189
greedybuddha Avatar answered Nov 15 '22 05:11

greedybuddha