Hadoop job taking input files from multiple directories

Tags:

I have a situation where I have multiple (100+ of 2-3 MB each) files in compressed gz format present in multiple directories. For Example
A1/B1/C1/part-0000.gz
A2/B2/C2/part-0000.gz
A1/B1/C1/part-0001.gz

I have to feed all these files into one Map job. From what I see , for using MultipleFileInputFormat all input files need to be in same directory . Is it possible to pass multiple directories directly into the job?
If not , then is it possible to efficiently put these files into one directory without naming conflict or to merge these files into 1 single compressed gz file.
Note: I am using plain java to implement the Mapper and not using Pig or hadoop streaming.

Any help regarding the above issue will be deeply appreciated.
Thanks,
Ankit

382

asked Jan 04 '11 11:01

Ankit

1 Answers

FileInputFormat.addInputPaths() can take a comma separated list of multiple files, like

FileInputFormat.addInputPaths("foo/file1.gz,bar/file2.gz")

131

answered Oct 12 '22 06:10

bajafresh4life

Related questions
                            
                                How do I convert a string of the form %programfiles%\directory\tool.exe to a useable Filename in C#/.net?
                            
                                C#/.NET - Custom Binary File Formats - Where to Start?
                            
                                Print text File to specific printer in java
                            
                                detect if contact has photo
                            
                                Open file for reading and writing(not appending) in perl
                            
                                Efficient way of writing to a text file in VB.NET
                            
                                UnicodeDecodeError: 'ascii' codec can't decode
                            
                                set the JFileChooser to open current directory
                            
                                How to create and write into text file in Lisp
                            
                                Exclude files from TFS via project file
                            
                                Appending multiple files into one
                            
                                File.renameTo() fails
                            
                                Is it better to use import scala.reflect.io.File or java.io.File in Scala?
                            
                                How do I read and write a C# string Dictionary to a file?
                            
                                The efficient way to print a table in GO
                            
                                Move file and override [duplicate]
                            
                                How to write a text file line by line in PHP?
                            
                                What's the best way to do a cross-platform, atomic file replacement in Perl?
                            
                                SSH Connection with Python 3.0
                            
                                how to extract specific bytes from a file using unix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hadoop job taking input files from multiple directories

Tags:

file

input

hadoop

Ankit

People also ask

1 Answers

bajafresh4life

Recent Activity

Donate For Us