We are new to postgres, we have following query by which we can select top N records from each category.
create table temp (
gp char,
val int
);
insert into temp values ('A',10);
insert into temp values ('A',8);
insert into temp values ('A',6);
insert into temp values ('A',4);
insert into temp values ('B',3);
insert into temp values ('B',2);
insert into temp values ('B',1);
select a.gp,a.val
from temp a
where a.val in (
select b.val
from temp b
where a.gp=b.gp
order by b.val desc
limit 2);
Output of above query is something like this
gp val
----------
A 10
A 8
B 3
B 2
But our requirement is different, we want to select top n% records from each category where n is not fixed, n is based of some percent of elements in each group.
For each group, you can apply an aggregate function e.g., SUM() to calculate the sum of items or COUNT() to get the number of items in the groups. In this syntax: First, select the columns that you want to group e.g., column1 and column2 , and column that you want to apply an aggregate function ( column3 ).
The PostgreSQL GROUP BY clause is used in collaboration with the SELECT statement to group together those rows in a table that have identical data. This is done to eliminate redundancy in the output and/or compute aggregates that apply to these groups.
Calculating the “percentage of the total” for each row with Postgres can be done with a window function: SELECT *, (value / SUM(value) OVER ()) AS "% of total" FROM transactions WHERE quarter = '2015-03-31' and company_id = 1; We're using “OVER ()”, which means the sum over all rows returned by the where clause.
If you want to select data from all the columns of the table, you can use an asterisk ( * ) shorthand instead of specifying all the column names. The select list may also contain expressions or literal values. Second, specify the name of the table from which you want to query data after the FROM keyword.
Referencing the response from a_horse_with_no_name, you can achieve something similar using percent_rank()
SELECT
gp,
val,
pct_rank
FROM (
SELECT
gp,
val,
percent_rank() over (order by val desc) as pct_rank
FROM variables.temp
) t
WHERE pct_rank <= 0.75;
You can then set the final WHERE clause to return data at whatever percent_rank() threshold you require.
To retrieve the rows based on the percentage of the number of rows in each group you can use two window functions: one to count the rows and one to give them a unique number.
select gp,
val
from (
select gp,
val,
count(*) over (partition by gp) as cnt,
row_number() over (partition by gp order by val desc) as rn
from temp
) t
where rn / cnt <= 0.75;
SQLFiddle example: http://sqlfiddle.com/#!15/94fdd/1
Btw: using char
is almost always a bad idea because it is a fixed-length data type that is padded to the defined length. I hope you only did that for setting up the example and don't use it in your real table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With