I want to return the top 5 rows of a group. Basically I have a table with some state names and their cities which is grouped by state name. I want to have the top 5 cities of that state and not all of them.
How can I do this using pig? Thank you in advance.
After a GROUP BY
, inside of a FOREACH
... you can do an ORDER BY
first, then LIMIT
. This will sort the things in each group first by city size, then pulls the top 5.
B = GROUP A BY state;
C = FOREACH B {
DA = ORDER A BY citysize DESC;
DB = LIMIT DA 5;
GENERATE FLATTEN(group), FLATTEN(DB.citysize), FLATTEN(DB.cityname);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With