Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using distinct and then an aggregate function in Postgresql?

This is a pretty basic problem and for whatever reason I can't find a reasonable solution. I'll do my best to explain.

Say you have an event ticket (section, row, seat #). Each ticket belongs to an attendee. Multiple tickets can belong to the same attendee. Each attendee has a worth (ex: Attendee #1 is worth $10,000). That said, here's what I want to do:

1. Group the tickets by their section
2. Get number of tickets (count)
3. Get total worth of the attendees in that section

Here's where I'm having problems: If Attendee #1 is worth $10,000 and is using 4 tickets, sum(attendees.worth) is returning $40,000. Which is not accurate. The worth should be $10,000. Yet when I make the result distinct on the attendee, the count is not accurate. In an ideal world it'd be nice to do something like

select 
    tickets.section, 
    count(tickets.*) as count, 
    sum(DISTINCT ON (attendees.id) attendees.worth) as total_worth 
from 
    tickets 
    INNER JOIN 
    attendees ON attendees.id = tickets.attendee_id 
GROUP BY tickets.section

Obviously this query doesn't work. How can I accomplish this same thing in a single query? OR is it even possible? I'd prefer to stay away from sub queries too because this is part of a much larger solution where I would need to do this across multiple tables.

Also, the worth should follow the ticket divided evenly. Ex: $10,000 / 4. Each ticket has an attendee worth of $5,000. So if the tickets are in different sections, they take their prorated worth with them.

Thanks for your help.

like image 533
Binary Logic Avatar asked Oct 08 '12 17:10

Binary Logic


People also ask

Can we use aggregate function with distinct?

You can use DISTINCT when performing an aggregation. You'll probably use it most commonly with the COUNT function. In this case, you should run the query below that counts the unique values in the month column.

How do you sum unique values in PostgreSQL?

The PostgreSQL SUM() is an aggregate function that returns the sum of values or distinct values. The SUM() function ignores NULL . It means that SUM() doesn't consider the NULL in calculation. If you use the DISTINCT option, the SUM() function calculates the sum of distinct values.

What is faster distinct or group by Postgres?

From experiments, I founded that the GROUP BY is 10+ times faster than DISTINCT. They are different. So what I learned is: GROUP-BY is anyway not worse than DISTINCT, and it is better sometimes.

Can we use distinct in PostgreSQL?

Removing duplicate rows from a query result set in PostgreSQL can be done using the SELECT statement with the DISTINCT clause. It keeps one row for each group of duplicates. The DISTINCT clause can be used for a single column or for a list of columns.


1 Answers

You need to aggregate the tickets before the attendees:

select ta.section, sum(ta.numtickets) as count, sum(a.worth) as total_worth
from (select attendee_id, section, count(*) as numtickets
      from tickets
      group by attendee_id, section
     ) ta INNER JOIN
     attendees a
     ON a.id = ta.attendee_id
GROUP BY ta.section

You still have a problem of a single attendee having seats in multiple sections. However, you do not specify how to solve that (apportion the worth? randomly choose one section? attribute it to all sections? canonically choose a section?)

like image 78
Gordon Linoff Avatar answered Oct 21 '22 14:10

Gordon Linoff