<pre class="prettyprint"><code>grunt> dump jn; (k1,k4,10) (k1,k5,15) (k2,k4,9) (k3,k4,16) grunt> jn = group jn by $1; grunt> dump jn; (k4,{(k1,k4,10),(k2,k4,9),(k3,k4,16)}) (k5,{(k1,k5,15)}) </code></pre> Now, from here I want the following output : <pre class="prettyprint"><code>(k4,{(k3,k4,16),(k1,k4,10)}) (k5,{(k1,k5,15)}) </code></pre> Bascially, I want to sort on the numbers : 10,9,16 and select the top 2 for every row. How do I do it?

This is similar to this question and you could use a Nested FOREACH, e.g.: <pre class="prettyprint"><code>A = LOAD 'data'; jn = group A by $1; B = FOREACH jn { sorted = ORDER A by $2 ASC; lim = LIMIT sorted 2; GENERATE lim; }; DUMP B; </code></pre>

Pig Order By Query

Tags:

sql-order-by

group-by

apache-pig

grunt> dump jn;

(k1,k4,10)
(k1,k5,15)
(k2,k4,9)
(k3,k4,16)

grunt> jn = group jn by $1;
grunt> dump jn;


(k4,{(k1,k4,10),(k2,k4,9),(k3,k4,16)})
(k5,{(k1,k5,15)})

Now, from here I want the following output :

(k4,{(k3,k4,16),(k1,k4,10)})
(k5,{(k1,k5,15)})

Bascially, I want to sort on the numbers : 10,9,16 and select the top 2 for every row.
How do I do it?

505

asked Feb 03 '12 07:02

simplfuzz

1 Answers

This is similar to this question and you could use a Nested FOREACH, e.g.:

A = LOAD 'data';
jn = group A by $1;
B = FOREACH jn {
  sorted = ORDER A by $2 ASC;
  lim = LIMIT sorted 2;
  GENERATE lim;
};
DUMP B;

119

answered Nov 03 '22 04:11

Romain

Related questions
                            
                                Not a GROUP BY expression error [duplicate]
                            
                                SQL: Getting the period start and end datetimes from data like this? Tricky little puzzle I'm struggling with
                            
                                MySQL Left Join Many to One Row
                            
                                Applying cumulative mean function to a grouped object
                            
                                pandas groupby last n
                            
                                Group mysql results by "every 30 days"
                            
                                Create aggregate output data.table from function returning multiple output
                            
                                Best data structure to "group by" and aggregate values in Java?
                            
                                find the max of column group by another column pandas
                            
                                Get grouped value divided by total
                            
                                How can I group by the difference between rows in a column with linq and c#?
                            
                                turning pandas to pyspark expression
                            
                                How I can apply groupby two times on pandas data frame?
                            
                                Custom pandas groupby on a list of intervals
                            
                                Assign group averages to each row in python/pandas
                            
                                Get rows corresponding to the minimum with pandas groupby
                            
                                PostgreSQL group by without aggregate function. Why does it work?
                            
                                How to use group by in xslt
                            
                                Can we have dynamic group by clause in sql server?
                            
                                LINQ Grouping Data Twice

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With