This may be a basic question, but I could not find an answer for it on Google.
I have a map-reduce job that creates multiple output files in its output directory.
My Java application executes this job on a remote hadoop cluster and after the job is finished, it needs to read the output programatically using org.apache.hadoop.fs.FileSystem
API. Is it possible?
The application knows the output directory, but not the names of the output files generated by the map-reduce job. It seems there is no way to programatically list the contents of a directory in the hadoop file system API. How will the output files be read?
It seems such a commonplace scenario, that I am sure it has a solution. But I am missing something very obvious.
MapReduce programming paradigm allows you to scale unstructured data across hundreds or thousands of commodity servers in an Apache Hadoop cluster. It has two main components or phases, the map phase and the reduce phase. The input data is fed to the mapper phase to map the data.
In Hadoop MapReduce job, each Reducer produces one output file with name part-r-nnnnn, where nnnnn is the sequence number of the file and is based on the number of reducers set for the job.
The output of the reducer is the final output, which is stored in HDFS. Usually, in the Hadoop Reducer, we do aggregation or summation sort of computation.
The method you are looking for is called listStatus(Path). It simply returns all files inside of a Path as a FileStatus array. Then you can simply loop over them create a path object and read it.
FileStatus[] fss = fs.listStatus(new Path("/"));
for (FileStatus status : fss) {
Path path = status.getPath();
SequenceFile.Reader reader = new SequenceFile.Reader(fs, path, conf);
IntWritable key = new IntWritable();
IntWritable value = new IntWritable();
while (reader.next(key, value)) {
System.out.println(key.get() + " | " + value.get());
}
reader.close();
}
For Hadoop 2.x you can setup the reader like this:
SequenceFile.Reader reader =
new SequenceFile.Reader(conf, SequenceFile.Reader.file(path))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With