Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop PIG Max of Tuple

How do I find the MAX of a tuple in Pig?

My code looks like this:

A,20
B,10
C,40
D,5

data = LOAD 'myData.txt' USING PigStorage(',') AS key, value;
all = GROUP data ALL;
maxKey = FOREACH all GENERATE MAX(data.value);
DUMP maxKey;

This returns 40, but I want the full key-value pair: C,40. Any ideas?

like image 304
supyo Avatar asked Dec 27 '12 14:12

supyo


2 Answers

This works with Pig 0.10.0:

data = LOAD 'myData.txt' USING PigStorage(',') AS (key, value: long);
A = GROUP data ALL;
B = FOREACH A GENERATE MAX(data.value) AS val;
C = FILTER data BY value == (long)C.val;
DUMP C;
like image 130
Frederic Avatar answered Sep 27 '22 23:09

Frederic


Try this:

data = LOAD 'myData.txt' USING PigStorage(',') AS (key: chararray, value: int);

sorted = ORDER data BY value DESC;

limited = LIMIT sorted 1;

projected = FOREACH limited GENERATE key;

DUMP projected;
like image 36
Ruslan Avatar answered Sep 28 '22 00:09

Ruslan