Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Left outer join two levels deep in Postgres results in cartesian product

Given the following 4 tables:

CREATE TABLE events ( id, name )
CREATE TABLE profiles ( id, event_id )
CREATE TABLE donations ( amount, profile_id )
CREATE TABLE event_members( id, event_id, user_id )

I'm attempting to get a list of all events, along with a count of any members, and a sum of any donations. The issue is the sum of donations is coming back wrong (appears to be a cartesian result of donations * # of event_members).

Here is the SQL query (Postgres)

SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM            events
LEFT OUTER JOIN profiles      ON events.id = profiles.event_id
LEFT OUTER JOIN donations     ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name

The sum(donations.amount) is coming back = to the actual sum of donations * number of rows in event_members. If I comment out the count(distinct event_members.id) and the event_members left outer join, the sum is correct.

Edit: Erwin pointed me in the right direction. Query rewritten to be:

SELECT events.name, COUNT(DISTINCT event_members.id), 
  select(SUM(donations.amount) from donations,profiles where donations.profile_id = profiles.id and profiles.event_id = events.id) as total_donations
    FROM            events
    LEFT OUTER JOIN event_members ON event_members.event_id = events.id
    GROUP BY events.name
like image 203
John P Avatar asked Feb 05 '13 07:02

John P


1 Answers

As I explained in detail under the referenced question you need to aggregate first, and then join the tables to avoid a proxy CROSS JOIN. Like so:

SELECT e.name, e.sum_donations, m.ct_members
FROM (
    SELECT e.id, e.name, SUM(d.amount) AS sum_donations
    FROM   events             e
    LEFT   JOIN profiles      p ON p.event_id = e.id
    LEFT   JOIN donations     d ON d.profile_id = p.id
    GROUP  BY 1, 2
    ) e
LEFT   JOIN (
    SELECT event_id, COUNT(DISTINCT id) AS ct_members
    FROM   event_members
    GROUP  BY 1
    ) m ON m.event_id = e.id

IF event_members.id is the primary key (as one might assume), You can simplify to

COUNT(*) AS ct_members

since id is guaranteed to be UNIQUE NOT NULL. That's a little faster.

like image 94
Erwin Brandstetter Avatar answered Sep 17 '22 01:09

Erwin Brandstetter