I am using the WordCount example and in the Reduce function, I need to get the file name.
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
String filename = ((FileSplit)(.getContext()).getInputSplit()).getPath().getName();
// ----------------------------^ I need to get the context and filename!
key.set(key.toString() + " (" + filename + ")");
output.collect(key, new IntWritable(sum));
}
}
This is the above modified code currently, where I wanna get the filename to be printed for the word. I tried following Java Hadoop: How can I create mappers that take as input files and give an output which is the number of lines in each file? but I couldn't get the context
object.
I am new to hadoop and need this help. Any help guys?
You can't get context
, because context
is a construct of the "new API", and you are using the "old API".
Check out this word count example instead: http://wiki.apache.org/hadoop/WordCount
See the signature of the reduce function in this case:
public void reduce(Text key, Iterable<IntWritable> values, Context context)
See! The context! Notice in this example it imports from .mapreduce.
instead of .mapred.
.
This is a common issue for new hadoop users, so don't feel bad. In general you want to stick to the new API for a number of reasons. But, be very careful of examples that you find. Also, realize that the new API and old API are not interoperable (e.g., you can't have a new API mapper and an old API reducer).
Using the old MR API (org.apache.hadoop.mapred package), add the below to the mapper/reducer class.
String fileName = new String();
public void configure(JobConf job)
{
filename = job.get("map.input.file");
}
Using the new MR API (org.apache.hadoop.mapreduce package), add the below to the mapper/reducer class.
String fileName = new String();
protected void setup(Context context) throws java.io.IOException, java.lang.InterruptedException
{
fileName = ((FileSplit) context.getInputSplit()).getPath().toString();
}
I used this way and it works!!!
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
FileSplit fileSplit = (FileSplit)reporter.getInputSplit();
String filename = fileSplit.getPath().getName();
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
Let me know if I can improve it!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With