Hadoop use KeyValueTextInputFormat

Question

I use hadoop 1.0.1 to do some project and I want to make my input .txt file be the "key" and "value" which I need, like:

If I have a test.txt file and the file content is

1, 10 10

I think I can use "KeyValueTextInputFormat" and make "," be the separation symbol, so after input, the key is "1" and the value is "10 10".

But, the result I got is all the information is key, the value is empty. I dont know where is the problem.

Please give me some help, thanks!

This is the example code:

public class WordCount{
    public class WordCountMapper extends Mapper<Text, Text, Text, Text>{  

        public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
            context.write(value, value);
            context.write(key, key);
        }   
      }
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        conf.set("key.value.separator.in.input.line",",");
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
          System.err.println("Usage: wordcount <in> <out>");
          System.exit(2);
        }
        Job job = new Job(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCountMapper.class);
        job.setInputFormatClass(KeyValueTextInputFormat.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        KeyValueTextInputFormat.addInputPath(job, new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
      }
}

Sasikanth Bharadwaj · Accepted Answer

The separator can be specified under the attribute name mapreduce.input.keyvaluelinerecordreader.key.value.separator, The default separator is the tab character (' '). So in your case change the line conf.set("key.value.separator.in.input.line",",");
to

conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator",",");

and that should do the trick

Hadoop use KeyValueTextInputFormat

Tags:

java

word-count

hadoop

mapreduce

whitesail

1 Answers

Sasikanth Bharadwaj

Recent Activity

Donate For Us

Hadoop use KeyValueTextInputFormat

Tags:

java

word-count

hadoop

mapreduce

whitesail

1 Answers

Sasikanth Bharadwaj

Related questions

Recent Activity

Donate For Us