I'm trying to put together a query that will retrieve the statistics of a user (profit/loss) as a cumulative result, over a period of time. Here's the query I have so far: <pre class="prettyprint"><code>SELECT p.name, e.date, sum(sp.payout) OVER (ORDER BY e.date) - sum(s.buyin) OVER (ORDER BY e.date) AS "Profit/Loss" FROM result r JOIN game g ON r.game_id = g.game_id JOIN event e ON g.event_id = e.event_id JOIN structure s ON g.structure_id = s.structure_id JOIN structure_payout sp ON g.structure_id = sp.structure_id AND r.position = sp.position JOIN player p ON r.player_id = p.player_id WHERE p.player_id = 17 GROUP BY p.name, e.date, e.event_id, sp.payout, s.buyin ORDER BY p.name, e.date ASC </code></pre> The query will run. However, the result is slightly incorrect. The reason is that an <code>event</code> can have multiple games (with different <code>sp.payouts</code>). Therefore, the above comes out with multiple rows if a user has 2 results in an event with different payouts (i.e. there are 4 games per event, and a user gets £20 from one, and £40 from another). The obvious solution would be to amend the <code>GROUP BY</code> to: <pre class="prettyprint"><code>GROUP BY p.name, e.date, e.event_id </code></pre> However, Postgres complains at this as it doesn't appear to be recognizing that <code>sp.payout</code> and <code>s.buyin</code> are inside an aggregate function. I get the error: <blockquote> column "sp.payout" must appear in the GROUP BY clause or be used in an aggregate function </blockquote> I'm running 9.1 on Ubuntu Linux server. Am I missing something, or could this be a genuine defect in Postgres?

You are not, in fact, using aggregate functions. You are using window functions. That's why PostgreSQL demands <code>sp.payout</code> and <code>s.buyin</code> to be included in the <code>GROUP BY</code> clause. By appending an <code>OVER</code> clause, the aggregate function <code>sum()</code> is turned into a window function, which aggregates values per partition while keeping all rows. You can combine window functions and aggregate functions. Aggregations are applied first. I did not understand from your description how you want to handle multiple payouts / buyins per event. As a guess, I calculate a sum of them per event. Now I can remove <code>sp.payout</code> and <code>s.buyin</code> from the <code>GROUP BY</code> clause and get one row per <code>player</code> and <code>event</code>: <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT p.name , e.event_id , e.date , sum(sum(sp.payout)) OVER w - sum(sum(s.buyin )) OVER w AS "Profit/Loss" FROM player p JOIN result r ON r.player_id = p.player_id JOIN game g ON g.game_id = r.game_id JOIN event e ON e.event_id = g.event_id JOIN structure s ON s.structure_id = g.structure_id JOIN structure_payout sp ON sp.structure_id = g.structure_id AND sp.position = r.position WHERE p.player_id = 17 GROUP BY e.event_id WINDOW w AS (ORDER BY e.date, e.event_id) ORDER BY e.date, e.event_id; </code></pre> In this expression: <code>sum(sum(sp.payout)) OVER w</code>, the outer <code>sum()</code> is a window function, the inner <code>sum()</code> is an aggregate function. Assuming <code>p.player_id</code> and <code>e.event_id</code> are <code>PRIMARY KEY</code> in their respective tables. I added <code>e.event_id</code> to the <code>ORDER BY</code> of the <code>WINDOW</code> clause to arrive at a deterministic sort order. (There could be multiple events on the same date.) Also included <code>event_id</code> in the result to distinguish multiple events per day. While the query restricts to a single player (<code>WHERE p.player_id = 17</code>), we don't need to add <code>p.name</code> or <code>p.player_id</code> to <code>GROUP BY</code> and <code>ORDER BY</code>. If one of the joins would multiply rows unduly, the resulting sum would be incorrect (partly or completely multiplied). Grouping by <code>p.name</code> could not repair the query then. I also removed <code>e.date</code> from the <code>GROUP BY</code> clause. The primary key <code>e.event_id</code> covers all columns of the input row since PostgreSQL 9.1. If you change the query to return multiple players at once, adapt: <pre class="prettyprint lang-sql prettyprint-override"><code>... WHERE p.player_id < 17 -- example - multiple players GROUP BY p.name, p.player_id, e.date, e.event_id -- e.date and p.name redundant WINDOW w AS (ORDER BY p.name, p.player_id, e.date, e.event_id) ORDER BY p.name, p.player_id, e.date, e.event_id; </code></pre> Unless <code>p.name</code> is defined unique (?), group and order by <code>player_id</code> additionally to get correct results in a deterministic sort order. I only kept <code>e.date</code> and <code>p.name</code> in <code>GROUP BY</code> to have identical sort order in all clauses, hoping for a performance benefit. Else, you can remove the columns there. (Similar for just <code>e.date</code> in the first query.)

Postgres window function and group by exception

Tags:

sql

postgresql

aggregate-functions

window-functions

I'm trying to put together a query that will retrieve the statistics of a user (profit/loss) as a cumulative result, over a period of time.

Here's the query I have so far:

SELECT p.name, e.date, 
    sum(sp.payout) OVER (ORDER BY e.date)
    - sum(s.buyin) OVER (ORDER BY e.date) AS "Profit/Loss" 
FROM result r 
    JOIN game g ON r.game_id = g.game_id 
    JOIN event e ON g.event_id = e.event_id 
    JOIN structure s ON g.structure_id = s.structure_id 
    JOIN structure_payout sp ON g.structure_id = sp.structure_id
                            AND r.position = sp.position 
    JOIN player p ON r.player_id = p.player_id 
WHERE p.player_id = 17 
GROUP BY p.name, e.date, e.event_id, sp.payout, s.buyin
ORDER BY p.name, e.date ASC

The query will run. However, the result is slightly incorrect. The reason is that an event can have multiple games (with different sp.payouts). Therefore, the above comes out with multiple rows if a user has 2 results in an event with different payouts (i.e. there are 4 games per event, and a user gets £20 from one, and £40 from another).

The obvious solution would be to amend the GROUP BY to:

GROUP BY p.name, e.date, e.event_id

However, Postgres complains at this as it doesn't appear to be recognizing that sp.payout and s.buyin are inside an aggregate function. I get the error:

column "sp.payout" must appear in the GROUP BY clause or be used in an aggregate function

I'm running 9.1 on Ubuntu Linux server.
Am I missing something, or could this be a genuine defect in Postgres?

570

asked Jan 13 '12 01:01

Martin

1 Answers

You are not, in fact, using aggregate functions. You are using window functions. That's why PostgreSQL demands sp.payout and s.buyin to be included in the GROUP BY clause.

By appending an OVER clause, the aggregate function sum() is turned into a window function, which aggregates values per partition while keeping all rows.

You can combine window functions and aggregate functions. Aggregations are applied first. I did not understand from your description how you want to handle multiple payouts / buyins per event. As a guess, I calculate a sum of them per event. Now I can remove sp.payout and s.buyin from the GROUP BY clause and get one row per player and event:

SELECT p.name
     , e.event_id
     , e.date
     , sum(sum(sp.payout)) OVER w
     - sum(sum(s.buyin  )) OVER w AS "Profit/Loss" 
FROM   player            p
JOIN   result            r ON r.player_id     = p.player_id  
JOIN   game              g ON g.game_id       = r.game_id 
JOIN   event             e ON e.event_id      = g.event_id 
JOIN   structure         s ON s.structure_id  = g.structure_id 
JOIN   structure_payout sp ON sp.structure_id = g.structure_id
                          AND sp.position     = r.position
WHERE  p.player_id = 17 
GROUP  BY e.event_id
WINDOW w AS (ORDER BY e.date, e.event_id)
ORDER  BY e.date, e.event_id;

In this expression: sum(sum(sp.payout)) OVER w, the outer sum() is a window function, the inner sum() is an aggregate function.

Assuming p.player_id and e.event_id are PRIMARY KEY in their respective tables.

I added e.event_id to the ORDER BY of the WINDOW clause to arrive at a deterministic sort order. (There could be multiple events on the same date.) Also included event_id in the result to distinguish multiple events per day.

While the query restricts to a single player (WHERE p.player_id = 17), we don't need to add p.name or p.player_id to GROUP BY and ORDER BY. If one of the joins would multiply rows unduly, the resulting sum would be incorrect (partly or completely multiplied). Grouping by p.name could not repair the query then.

I also removed e.date from the GROUP BY clause. The primary key e.event_id covers all columns of the input row since PostgreSQL 9.1.

If you change the query to return multiple players at once, adapt:

...
WHERE  p.player_id < 17  -- example - multiple players
GROUP  BY p.name, p.player_id, e.date, e.event_id  -- e.date and p.name redundant
WINDOW w AS (ORDER BY p.name, p.player_id, e.date, e.event_id)
ORDER  BY p.name, p.player_id, e.date, e.event_id;

Unless p.name is defined unique (?), group and order by player_id additionally to get correct results in a deterministic sort order.

I only kept e.date and p.name in GROUP BY to have identical sort order in all clauses, hoping for a performance benefit. Else, you can remove the columns there. (Similar for just e.date in the first query.)

answered Sep 22 '22 14:09

Erwin Brandstetter

Related questions
                            
                                What can i use for a no-op in T-SQL? [duplicate]
                            
                                Fake a long running SQL statement
                            
                                MYSQL Stored Procedures: Variable Declaration and Conditional Statements
                            
                                Custom function with check constraint SQL Server 2008
                            
                                How to convert SQL query results into a python dictionary
                            
                                Is there a shortcut for SELECT * FROM?
                            
                                MSSQL: Disable triggers for one INSERT
                            
                                Tool to export result set from SQL to Insert statements?
                            
                                SQL Insert Into Temp Table in both If and Else Blocks
                            
                                Reuse identity value after deleting rows
                            
                                Why can't I reorder my SQL Server columns?
                            
                                How to convert Varchar to Int in sql server 2008?
                            
                                How do I iterate through the values of a row from a result set in java?
                            
                                In postgresql, what's the difference a "database" and a "relation"? ('error relation x does not exist', 'error database x already exists')
                            
                                Query runs slow with date expression, but fast with string literal
                            
                                native insert query in hibernate + spring data
                            
                                Join pandas dataframes based on column values
                            
                                SQL: Do you need an auto-incremental primary key for Many-Many tables?
                            
                                How to find "related items" in PHP
                            
                                SqlException: Deadlock

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With