Map Reduce output to CSV or do I need Key Values?

Tags:

My map function produces a

Key\tValue

Value = List(value1, value2, value3)

then my reduce function produces:

Key\tCSV-Line

Ex.

2323232-2322 fdsfs,sdfs,dfsfs,0,0,0,2,fsda,3,23,3,s,

2323555-22222 dfasd,sdfas,adfs,0,0,2,0,fasafa,2,23,s

Ex. RawData: 232342|@3423@|34343|sfasdfasdF|433443|Sfasfdas|324343 x 1000

Anyway I want to eliminate the key's at the beginning of that so my client can do a straight import into mysql. I have about 50 data files, my question is after it maps them once and the reducer starts does it need the key printed out with the value or can I just print the value?

More information:

Here this code might shine some better light on the situation

http://pastebin.ca/2410217

this is kinda what I plan to do.

426

asked Jun 26 '13 23:06

Jake Steele

2 Answers

If you do not want to emit the key set it to NullWritable in your code. For example :

public static class TokenCounterReducer extends
            Reducer<Text, IntWritable, NullWritable, IntWritable> {
        public void reduce(Text key, Iterable<IntWritable> values,
                Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable value : values) {
                sum += value.get();
            }
            context.write(NullWritable.get(), new IntWritable(sum));
//          context.write(key, new IntWritable(sum));
        }

Let me know if this is not what you need, i'll update the answer accordingly.

167

answered Sep 22 '22 00:09

Tariq

Your reducer can emit a line without \t, or, in your case, just what you're calling the value. Unfortunately, hadoop streaming will interpret this as a key with a null value and automatically append a delimiter (\t by default) to the end of each line. You can change what this delimiter is but, when I played around with this, I could not get it to not append a delimiter. I don't remember the exact details but based on this (Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?) I think the property is mapred.textoutputformat.separator. My solution was to strip the \t at the end of each line as I pulled the file back:

hadoop fs -cat hadoopfile | perl -pe 's/\t$//' > destfile

answered Sep 21 '22 00:09

John Pickard

Related questions
                            
                                What is the difference between TRUNC and TO_DATE in Hive
                            
                                Specifying compression codec for a INSERT OVERWRITE SELECT in Hive
                            
                                How can I be sure that data is distributed evenly across the hadoop nodes?
                            
                                Log Structured Merge Tree in Hbase
                            
                                POC for Hadoop in real time scenario
                            
                                Specifying multiple filter criteria through Oozie command line
                            
                                Free Hadoop Cluster for Experiments [closed]
                            
                                Why we need to move external table to managed hive table?
                            
                                Differences between existing MapReduce and YARN (MRv2)
                            
                                spark on yarn; how to send metrics to graphite sink?
                            
                                Hadoop 2.x -- how to configure secondary namenode?
                            
                                query hive partitioned table over date/time range
                            
                                Kafka Memory requirement
                            
                                How to know the exact block size of a file on a Hadoop node?
                            
                                Hadoop HDFS - Difference between Missing replica and Under replicated blocks
                            
                                hdfs copy multiple files to same target directory
                            
                                Hadoop streaming job failure: Task process exit with nonzero status of 137
                            
                                finding mean using pig or hadoop
                            
                                Merging multiple sequence files into one sequencefile within Hadoop
                            
                                Hadoop and Amazon Web Services [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Map Reduce output to CSV or do I need Key Values?

Tags:

hadoop

mapreduce

hadoop-streaming

elastic-map-reduce

Jake Steele

People also ask

2 Answers

Tariq

John Pickard

Recent Activity

Donate For Us