Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?

I think the title is already explaining my question. I would like to change

key (tab space) value

into

key;value

in all output files the reducers are generating from the output of mappers.

I could not find good documentation on this using google. Can anyone please give a fraction of code on how to achieve this?

like image 323
Bob Avatar asked Jun 14 '12 11:06

Bob


2 Answers

In lack of better documentation, here's what I've collected:

    setTextOutputFormatSeparator(final Job job, final String separator){
            final Configuration conf = job.getConfiguration(); //ensure accurate config ref

            conf.set("mapred.textoutputformat.separator", separator); //Prior to Hadoop 2 (YARN)
            conf.set("mapreduce.textoutputformat.separator", separator);  //Hadoop v2+ (YARN)
            conf.set("mapreduce.output.textoutputformat.separator", separator);
            conf.set("mapreduce.output.key.field.separator", separator);
            conf.set("mapred.textoutputformat.separatorText", separator); // ?
    }
like image 81
xgMz Avatar answered Nov 03 '22 05:11

xgMz


Set the configuration property mapred.textoutputformat.separator to ";"

like image 31
Chris White Avatar answered Nov 03 '22 03:11

Chris White