Get the distinct sum of a joined table column

Question

I have a problem here, and I'm hoping there is an easy solution. I'll try to make this as simple as possible:

A ticket belongs to an attendee
Example:

select * from tickets JOIN attendees ON attendee.id = tickets.attendee_id

An attendee has a decimal column called "revenue"

That said, I need to run a query that will return a variety of information about the tickets, including the total revenue. The problem is that if 2 tickets belong to the same attendee, it counts their revenue twice. How can I sum the attendee revenue only once?

I don't want to use subqueries as my ORM makes this difficult. Plus a sub query solution doesn't scale if I want to do this for multiple columns.

Here's what I have:

1 attendees with a revenue of 100
2 tickets that both belong to that attendee

Select count(tickets.*) as tickets_count
     , sum(attendees.revenue) as atendees_revenue
from tickets LEFT OUTER JOIN attendees ON attendees.id = tickets.attendee_id;

=> This tells me that attendees_revenue is 200. I want it to be 100. Since there is one attendee in the database with an existing_revenue of 100. I do NOT want the attendee to be double counted.

Please let me know if this is possible.

Erwin Brandstetter · Accepted Answer

To get the result without subquery, you have to resort to advanced window function trickery:

SELECT sum(count(*))       OVER () AS tickets_count
     , sum(min(a.revenue)) OVER () AS atendees_revenue
FROM   tickets   t
JOIN   attendees a ON a.id = t.attendee_id
GROUP  BY t.attendee_id
LIMIT  1;

sqlfiddle

How does it work?

The key to understanding this is the sequence of events in the query:

aggregate functions -> window functions -> DISTINCT -> LIMIT

More details:

Best way to get result count before LIMIT was applied

Step by step:

I GROUP BY t.attendee_id - which you would normally do in a subquery.
Then I sum over the counts to get the total count of tickets. Not very efficient, but forced by your requirement. The aggregate function count(*) is wrapped in the window function sum( ... ) OVER () to arrive at the not-so-common expression: sum(count(*)) OVER ().

And sum the minimum revenue per attendee to get the sum without duplicates.

You could also use max() or avg() instead of min() to the same effect as revenue is guaranteed to be the same for every row per attendee.

This could be simpler if DISTINCT was allowed in window functions, but PostgreSQL has not (yet) implemented this feature. Per documentation:

Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list.
Final step is to get a single row. This could be done with DISTINCT (SQL standard) since all rows are the same. LIMIT 1 will be faster, though. Or the SQL-standard form FETCH FIRST 1 ROWS ONLY.

Heriberto Magaña · Answer

What about a simple division:

 Select count(tickets.*) as tickets_count
 , sum(attendees.revenue) / count(attendees.id) as atendees_revenue
 from tickets LEFT OUTER JOIN attendees ON attendees.id = 
 tickets.attendee_id;

This should handle duplicates, triplicates, etcetera.

Get the distinct sum of a joined table column

Tags:

sql

postgresql

aggregate-functions

duplicate-removal

window-functions

Binary Logic

2 Answers

How does it work?

Erwin Brandstetter

Heriberto Magaña

Recent Activity

Donate For Us

Get the distinct sum of a joined table column

Tags:

sql

postgresql

aggregate-functions

duplicate-removal

window-functions

Binary Logic

2 Answers

How does it work?

Erwin Brandstetter

Heriberto Magaña

Related questions

Recent Activity

Donate For Us