Hadoop, MapReduce - Multiple Input/Output Paths

Tags:

In my input file when making the Jar for my MapReduce Job, I am using the Hadoop-local command. I wanted to know whether there was a way of, instead of specifically specifying the path for each file in my input folder to be used in the MapReduce job, whether I could just specify and pass all the files from my input folder. This is because the contents and number of files could change due to the nature of the MapReduce job I am trying to configure and as I do not know the specific amount of files, apart from just the contents of these files, is there a way to pass all files from the input folder into my MapReduce program and then iterate over each file to compute a certain function which would then send the results to the Reducer. I am only using one Map/Reduce program and I am coding in Java. I am able to use the hadoop-moonshot command, but I am working with hadoop-local at the moment.

Thanks.

419

asked May 14 '16 17:05

Shah.1

1 Answers

You don't have to pass individual file as input for MapReduce Job.

FileInputFormat class already provides API to accept list of multiple files as Input to Map Reduce program.

public static void setInputPaths(Job job,
                 Path... inputPaths)
                          throws IOException

Add a Path to the list of inputs for the map-reduce job. Parameters:

conf - The configuration of the job

path - Path to be added to the list of inputs for the map-reduce job.

Example code from Apache tutorial

Job job = Job.getInstance(conf, "word count");
FileInputFormat.addInputPath(job, new Path(args[0]));

MultipleInputs provides below APIs.

public static void addInputPath(Job job,
                Path path,
                Class<? extends InputFormat> inputFormatClass,
                Class<? extends Mapper> mapperClass)

Add a Path with a custom InputFormat and Mapper to the list of inputs for the map-reduce job.

Ravindra babu

Related questions
                            
                                Recognize arrow keys in Java Scanner or Console application
                            
                                App Widget: Ripple effect lost on list item when background is added to outer layout
                            
                                How to add native android code to LibGDX?
                            
                                Get org.hibernate.LazyInitializationException in spring boot integration test
                            
                                Java - Invoking a method causes IllegalAccessError
                            
                                Is really “No Public Class” reachable within its package? [duplicate]
                            
                                AccessibilityEvent.getPackageName() returns null
                            
                                How do I include types in Jersey WADL while also returning Response
                            
                                Java name collision between variable and top-level package name
                            
                                OpenCV library loaded in hadoop but not working
                            
                                How to change linespacing of TextArea in JavaFX
                            
                                Billing service unavailable on device. (response: 3:Billing Unavailable)
                            
                                Choose best combinations of operators to find target number
                            
                                How to define byte[] and LocalDateTime in avro schema?
                            
                                Environment specific properties in JBOSS AS7
                            
                                Mockito's verify method interferes with doAnswer's checks
                            
                                Configuring Swagger with Play/Java-framework-2.5
                            
                                How to add mapping in ElasticSearch using JEST
                            
                                BeanCreationException : Invocation of init method failed
                            
                                @DefaultValue for a list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hadoop, MapReduce - Multiple Input/Output Paths

Tags:

java

hadoop

mapreduce

Shah.1

People also ask

1 Answers

Ravindra babu

Recent Activity

Donate For Us