I've been trying to wrap my head around creating aggregates in PostgreSQL (either 8.4 or 9.1) that accept one or more option parameters.
An example would be creating a PL/R
extension to compute the p-th quantile, with 0 <= p <= 1
. This would look something like quantile(x,p)
, and as part of a query:
select category,quantile(x,0.25)
from TABLE
group by category
order by category;
Where TABLE (category:text, x:float)
.
Suggestions?
For each group, you can apply an aggregate function e.g., SUM() to calculate the sum of items or COUNT() to get the number of items in the groups. In this syntax: First, select the columns that you want to group e.g., column1 and column2 , and column that you want to apply an aggregate function ( column3 ).
To define a new aggregate function, one selects a data type for the state value, an initial value for the state, and a state transition function. The state transition function takes the previous state value and the aggregate's input value(s) for the current row, and returns a new state value.
Fortunately, PostgreSQL is really flexible and allows end users to create your own aggregation functions, which can help to move your business logic to PostgreSQL. The “CREATE AGGREGATE” command is there to create all kinds of aggregations.
Aggregate functions are not allowed because the WHERE clause is used for filtering data before aggregation. So while WHERE isn't for aggregation, it has other uses. To filter data based on an aggregate function result, you must use the HAVING clause.
Hopefully this example will help. You need a function that takes (accumulator, aggregate-arguments) and returns the new accumulator value. Play around with the code below and that should give you a feel for how it all fits together.
BEGIN;
CREATE FUNCTION sum_product_fn(int,int,int) RETURNS int AS $$
SELECT $1 + ($2 * $3);
$$ LANGUAGE SQL;
CREATE AGGREGATE sum_product(int, int) (
sfunc = sum_product_fn,
stype = int,
initcond = 0
);
SELECT
sum(i) AS one,
sum_product(i, 2) AS double,
sum_product(i,3) AS triple
FROM generate_series(1,3) i;
ROLLBACK;
That should give you something like:
one | double | triple
-----+--------+--------
6 | 12 | 18
This can be achieved with the ntile windowing function
-- To calculate flexible quantile ranges in postgresql, for example to calculate n equal
-- frequency buckets for your data for use in a visualisation (such as binning for a
-- choropleth map), you can use the following SQL:
-- this functions returns 6 equal frequency bucket ranges for my_column.
SELECT ntile, avg(my_column) AS avgAmount, max(my_column) AS maxAmount, min(my_column) AS minAmount
FROM (SELECT my_column, ntile(6) OVER (ORDER BY my_column) AS ntile FROM my_table) x
GROUP BY ntile ORDER BY ntile
You can find more on the ntile() function and windowing at http://database-programmer.blogspot.com/2010/11/really-cool-ntile-window-function.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With