Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hadoop pig return top 5 rows

I want to return the top 5 rows of a group. Basically I have a table with some state names and their cities which is grouped by state name. I want to have the top 5 cities of that state and not all of them.

How can I do this using pig? Thank you in advance.

like image 714
user1855165 Avatar asked Dec 20 '22 12:12

user1855165


1 Answers

After a GROUP BY, inside of a FOREACH... you can do an ORDER BY first, then LIMIT. This will sort the things in each group first by city size, then pulls the top 5.

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY citysize DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(group), FLATTEN(DB.citysize), FLATTEN(DB.cityname);
}
like image 166
Donald Miner Avatar answered Jan 15 '23 16:01

Donald Miner