If I have multiple items listed in a where clause How would one go about limiting the results to N for each item in the list?
EX:
select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be non-negative integer constants. The first argument specifies the offset of the first row to return (as of Hive 2.0.
Solution. Order the records first and then apply the LIMIT clause to limit the number of records.
Limitation of HiveIt does not offer real-time queries for row-level updates. The latency in the apache hive query is very high. Hive only supported online analytical processing (OLAP) and doesn't support online transaction processing (OLTP). Hive Query Language doesn't support the transaction processing feature.
Reducer will not run if we use limit in select clause.
Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:
SELECT a_id, b, c, count(*) as sumrequests
FROM (
SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
FROM table_name
) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;
This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With