Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pig Order By Query

grunt> dump jn;

(k1,k4,10)
(k1,k5,15)
(k2,k4,9)
(k3,k4,16)

grunt> jn = group jn by $1;
grunt> dump jn;


(k4,{(k1,k4,10),(k2,k4,9),(k3,k4,16)})
(k5,{(k1,k5,15)})

Now, from here I want the following output :

(k4,{(k3,k4,16),(k1,k4,10)})
(k5,{(k1,k5,15)})

Bascially, I want to sort on the numbers : 10,9,16 and select the top 2 for every row.
How do I do it?

like image 505
simplfuzz Avatar asked Feb 03 '12 07:02

simplfuzz


People also ask

How do you use order by in Pig?

The ORDER-BY operator is used to display the content of of a relation in a sorted order based on one or more fields. Suppose you have a . txt file and you have LOAD the file into pig. After that, you can sort the details of that file based on any field you want.

What does group by do in Pig?

The GROUP operator is used to group the data in one or more relations. It collects the data having the same key.

How do you use distinct in pigs?

It is simple to perform a DISTINCT operation on all of the columns: A = LOAD 'data' AS (a1,a2,a3,a4); A_unique = DISTINCT A; Lets say that I am interested in performing the distinct across a1, a2, and a3.


1 Answers

This is similar to this question and you could use a Nested FOREACH, e.g.:

A = LOAD 'data';
jn = group A by $1;
B = FOREACH jn {
  sorted = ORDER A by $2 ASC;
  lim = LIMIT sorted 2;
  GENERATE lim;
};
DUMP B;
like image 119
Romain Avatar answered Nov 03 '22 04:11

Romain