If I have multiple items listed in a where clause How would one go about limiting the results to N for each item in the list? EX: <pre class="prettyprint"><code>select a_id,b,c, count(*), as sumrequests from table_name where a_id in (1,2,3) group by a_id,b,c limit 10000 </code></pre>

Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like: <pre class="prettyprint"><code>SELECT a_id, b, c, count(*) as sumrequests FROM ( SELECT a_id, b, c, row_number() over (Partition BY a_id) as row FROM table_name ) rs WHERE row <= 10000 AND a_id in (1, 2, 3) GROUP BY a_id, b, c; </code></pre> This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.

Hive QL - Limiting number of rows per each item

Tags:

hql

hadoop

hive

hiveql

If I have multiple items listed in a where clause How would one go about limiting the results to N for each item in the list?

EX:

select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000

271

asked Jul 31 '12 23:07

Eric Philmore

1 Answers

Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:

SELECT a_id, b, c, count(*) as sumrequests
FROM (
    SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
    FROM table_name
    ) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;

This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.

answered Nov 05 '22 23:11

Carter Shanklin

Related questions
                            
                                hadoop map reduce taking forever to complete
                            
                                Oozie shell action not running as submitting user
                            
                                Pyspark: shuffle RDD
                            
                                java.net.ConnectException: Your endpoint configuration is wrong;
                            
                                Which hadoop version to use?
                            
                                Running Map-Reduce job on specific files/blocks in HDFS
                            
                                Hadoop error in shuffle in fetcher#1
                            
                                Save flume output to hive table with Hive Sink
                            
                                Spark pulling data into RDD or dataframe or dataset
                            
                                hive external table needing write access
                            
                                Yarn slave nodes are not communicating with master node?
                            
                                How can I force spark/hadoop to ignore the .gz extension on a file and read it as uncompressed plain text?
                            
                                How does Hadoop's RunJar method distribute class/jar files across nodes?
                            
                                Hadoop MR source: HDFS vs HBase. Benefits of each?
                            
                                Hadoop Streaming: Mapper 'wrapping' a binary executable
                            
                                Large Data Sets - NoSQL, NewSQL, SQL..? Brain Fried
                            
                                mapreduce count example
                            
                                How to read hadoop sequential file?
                            
                                Hbase: How to specify hostname for Hbase master
                            
                                Hadoop configuration: mapred.* vs mapreduce.*

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With