Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Representative key from grouping comparator

Tags:

hadoop

In Hadoop, I can set a grouping comparator to determine which keys (and their values) are clubbed together within a single reduce call. But the signature of the reduce has only one key in it, so if I decide to club together composite keys based on one attribute, what key would be presented to the reducer in reduce call?

like image 569
abhinavkulkarni Avatar asked Nov 01 '22 13:11

abhinavkulkarni


1 Answers

This depends on how it has been implemented. Seeing from the description of the issue which actually led to grouping comparator's implementation, it would be plainly said that it is the first occurrence of the key which would be taken to the reduce method.

Say your reduce inputs look like:

A1, V1
A2, V2
A3, V3
B1, V4
B2, V5

instead of getting calls to reduce that look like:

reduce(A1,
{V1}
); reduce(A2,
{V2}
); reduce(A3,
{V3}
); reduce(B1,
{V4}
); reduce(B2,
{V5}
);

you could define the grouping comparator to just compare the letters and end up with:

reduce(A1,
{V1,V2,V3}
); reduce(B1,
{V4,V5}
);

which is the desired outcome after using a grouping comparator.

like image 169
Amar Avatar answered Nov 15 '22 06:11

Amar