Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql : How do I select top n percent(%) entries from each group/category

We are new to postgres, we have following query by which we can select top N records from each category.

 create table temp (
     gp char,
     val int
 );

 insert into temp values ('A',10);
 insert into temp values ('A',8);
 insert into temp values ('A',6);
 insert into temp values ('A',4);
 insert into temp values ('B',3);
 insert into temp values ('B',2);
 insert into temp values ('B',1);

 select a.gp,a.val
 from   temp a
 where  a.val in (
              select b.val
              from   temp b
              where  a.gp=b.gp
              order by b.val desc
             limit 2);

Output of above query is something like this

 gp   val
 ----------
 A    10
 A    8
 B    3
 B    2

But our requirement is different, we want to select top n% records from each category where n is not fixed, n is based of some percent of elements in each group.

like image 784
dpilwal Avatar asked Jul 08 '14 07:07

dpilwal


People also ask

Is it possible to apply an aggregate function for each group in PostgreSQL?

For each group, you can apply an aggregate function e.g., SUM() to calculate the sum of items or COUNT() to get the number of items in the groups. In this syntax: First, select the columns that you want to group e.g., column1 and column2 , and column that you want to apply an aggregate function ( column3 ).

What does group by do in PostgreSQL?

The PostgreSQL GROUP BY clause is used in collaboration with the SELECT statement to group together those rows in a table that have identical data. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups.

How do I calculate percentage of PostgreSQL?

Calculating the “percentage of the total” for each row with Postgres can be done with a window function: SELECT *, (value / SUM(value) OVER ()) AS "% of total" FROM transactions WHERE quarter = '2015-03-31' and company_id = 1; We're using “OVER ()”, which means the sum over all rows returned by the where clause.

How do I SELECT values in PostgreSQL?

If you want to select data from all the columns of the table, you can use an asterisk ( * ) shorthand instead of specifying all the column names. The select list may also contain expressions or literal values. Second, specify the name of the table from which you want to query data after the FROM keyword.


2 Answers

Referencing the response from a_horse_with_no_name, you can achieve something similar using percent_rank()

SELECT
    gp,
    val,
    pct_rank
FROM (
    SELECT
        gp,
        val,
        percent_rank() over (order by val desc) as pct_rank
    FROM variables.temp
    ) t
WHERE pct_rank <= 0.75;

You can then set the final WHERE clause to return data at whatever percent_rank() threshold you require.

like image 118
xrpza Avatar answered Sep 26 '22 12:09

xrpza


To retrieve the rows based on the percentage of the number of rows in each group you can use two window functions: one to count the rows and one to give them a unique number.

select gp,
       val
from (
  select gp, 
         val,
         count(*) over (partition by gp) as cnt,
         row_number() over (partition by gp order by val desc) as rn
  from temp
) t
where rn / cnt <= 0.75;

SQLFiddle example: http://sqlfiddle.com/#!15/94fdd/1


Btw: using char is almost always a bad idea because it is a fixed-length data type that is padded to the defined length. I hope you only did that for setting up the example and don't use it in your real table.

like image 22
a_horse_with_no_name Avatar answered Sep 22 '22 12:09

a_horse_with_no_name