Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive QL - Limiting number of rows per each item

If I have multiple items listed in a where clause How would one go about limiting the results to N for each item in the list?

EX:

select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000
like image 271
Eric Philmore Avatar asked Jul 31 '12 23:07

Eric Philmore


People also ask

How do I restrict the number of rows in hive?

The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be non-negative integer constants. The first argument specifies the offset of the first row to return (as of Hive 2.0.

How do I select only few rows in hive?

Solution. Order the records first and then apply the LIMIT clause to limit the number of records.

What are the main limitations of SQL over hive?

Limitation of HiveIt does not offer real-time queries for row-level updates. The latency in the apache hive query is very high. Hive only supported online analytical processing (OLAP) and doesn't support online transaction processing (OLTP). Hive Query Language doesn't support the transaction processing feature.

Will the reducer work or not if you use limit 1 in any HiveQL query?

Reducer will not run if we use limit in select clause.


1 Answers

Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:

SELECT a_id, b, c, count(*) as sumrequests
FROM (
    SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
    FROM table_name
    ) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;

This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.

like image 74
Carter Shanklin Avatar answered Nov 05 '22 23:11

Carter Shanklin