I'm running a Hadoop job using Hive actually that is supposed to <code>uniq</code> lines in many text files. In the reduce step, it chooses the most recently timestamped record for each key. Does Hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if many reducers are running across a cluster? I worry that the mapper output might be split after the shuffle happens in the middle of a set of records with the same key.

All values for a key are sent to the same reducer. See this Yahoo! tutorial for more discussion. This behavior is determined by the partitioner, and might not be true if you use a partitioner other than the default.

Using Hadoop, are my reducers guaranteed to get all the records with the same key?

1 Answers

All values for a key are sent to the same reducer. See this Yahoo! tutorial for more discussion.

This behavior is determined by the partitioner, and might not be true if you use a partitioner other than the default.

answered Sep 28 '22 09:09

Karl Anderson

Related questions
                            
                                Installing Hbase / Hadoop on EC2 cluster
                            
                                Apache Spark EOF exception
                            
                                What is difference between Oozie workflow, coordinator and bundle
                            
                                Parallel Algorithms for Generating Prime Numbers (possibly using Hadoop's map reduce)
                            
                                Wordcount program is stuck in hadoop-2.3.0
                            
                                Why does relocation with the maven shade plugin not work?
                            
                                Loop over files in HDFS directory
                            
                                Is there a good library for accessing HBase from Python? [closed]
                            
                                Attempt to do update or delete using transaction manager that does not support these operations
                            
                                How to customize Writable class in Hadoop?
                            
                                How to specify KeyValueTextInputFormat Separator in Hadoop-.20 api?
                            
                                Julia on Hadoop? [closed]
                            
                                Spark : multiple spark-submit in parallel
                            
                                Hbase Schema Nested Entity
                            
                                Hadoop Client Node Configuration
                            
                                beeline not able to connect to hiveserver2
                            
                                Using Phoenix with Cloudera Hbase (installed from repo)
                            
                                Read from a hive table and write back to it using spark sql
                            
                                How to update a file in HDFS
                            
                                Analytics and Mining of data sitting on Cassandra

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Hadoop, are my reducers guaranteed to get all the records with the same key?

Tags:

hadoop

hive

mapreduce

uniq

samg

People also ask

1 Answers

Karl Anderson

Recent Activity

Donate For Us