Can any one explain me how secondary sorting works in hadoop ? Why must one use <code>GroupingComparator</code> and how does it work in hadoop ? I was going through the link given below and got doubt on how groupcomapator works. Can any one explain me how grouping comparator works? http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html

I find it easy to understand certain concepts with help of diagrams and this is certainly one of them. Lets assume that our secondary sorting is on a composite key made out of Last Name and First Name. <img src="https://i.stack.imgur.com/T58wv.png" alt="Composite Key"> With the composite key out of the way, now lets look at the secondary sorting mechanism <img src="https://i.stack.imgur.com/l6IEl.png" alt="Secondary Sorting Steps"> The partitioner and the group comparator use only natural key, the partitioner uses it to channel all records with the same natural key to a single reducer. This partitioning happens in the Map Phase, data from various Map tasks are received by reducers where they are grouped and then sent to the reduce method. This grouping is where the group comparator comes into picture, if we would not have specified a custom group comparator then Hadoop would have used the default implementation which would have considered the entire composite key, which would have lead to incorrect results. Overview of MR steps <img src="https://i.stack.imgur.com/nLgYK.png" alt="enter image description here">

hadoop map reduce secondary sorting

2 Answers

I find it easy to understand certain concepts with help of diagrams and this is certainly one of them.

Lets assume that our secondary sorting is on a composite key made out of Last Name and First Name.

Composite Key

With the composite key out of the way, now lets look at the secondary sorting mechanism

Secondary Sorting Steps

The partitioner and the group comparator use only natural key, the partitioner uses it to channel all records with the same natural key to a single reducer. This partitioning happens in the Map Phase, data from various Map tasks are received by reducers where they are grouped and then sent to the reduce method. This grouping is where the group comparator comes into picture, if we would not have specified a custom group comparator then Hadoop would have used the default implementation which would have considered the entire composite key, which would have lead to incorrect results.

Overview of MR steps

enter image description here

187

answered Oct 02 '22 13:10

Sudarshan

Grouping Comparator

Once the data reaches a reducer, all data is grouped by key. Since we have a composite key, we need to make sure records are grouped solely by the natural key. This is accomplished by writing a custom GroupPartitioner. We have a Comparator object only considering the yearMonth field of the TemperaturePair class for the purposes of grouping the records together.

public class YearMonthGroupingComparator extends WritableComparator {      public YearMonthGroupingComparator() {         super(TemperaturePair.class, true);     }      @Override     public int compare(WritableComparable tp1, WritableComparable tp2) {         TemperaturePair temperaturePair = (TemperaturePair) tp1;         TemperaturePair temperaturePair2 = (TemperaturePair) tp2;         return temperaturePair.getYearMonth().compareTo(temperaturePair2.getYearMonth());     } }

Here are the results of running our secondary sort job:

new-host-2:sbin bbejeck$ hdfs dfs -cat secondary-sort/part-r-00000

190101 -206

190102 -333

190103 -272

190104 -61

190105 -33

190106 44

190107 72

190108 44

190109 17

190110 -33

190111 -217

190112 -300

While sorting data by value may not be a common need, it’s a nice tool to have in your back pocket when needed. Also, we have been able to take a deeper look at the inner workings of Hadoop by working with custom partitioners and group partitioners. Refer this link also..What is the use of grouping comparator in hadoop map reduce

answered Oct 02 '22 14:10

Deepika C P

Related questions
                            
                                How to extract the sign of an integer in Ruby?
                            
                                "MongoError: Can't extract geo keys from object" with Type : Point
                            
                                Download attribute not working in Firefox
                            
                                Add Gradient background to layouts in Xamarin Forms visual studio
                            
                                What does the "^@" mean in file?
                            
                                How to check active transactions in SQL Server 2014?
                            
                                Octave Fontconfig error
                            
                                Android databinding set padding if value is true
                            
                                Add 30 days to date [duplicate]
                            
                                Executing terminal commands in Jupyter notebook
                            
                                TypeError: 'WebElement' object is not iterable error
                            
                                apksigner not accepting password

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

hadoop map reduce secondary sorting

Tags:

user1585111

People also ask

2 Answers

Sudarshan

Deepika C P

Recent Activity

Donate For Us