Two SQL LEFT JOINS produce incorrect result

Tags:

I have 3 tables:

users(id, account_balance)
grocery(user_id, date, amount_paid)
fishmarket(user_id, date, amount_paid)

Both fishmarket and grocery tables may have multiple occurrences for the same user_id with different dates and amounts paid or have nothing at all for any given user. When I try the following query:

SELECT
     t1."id" AS "User ID",
     t1.account_balance AS "Account Balance",
     count(t2.user_id) AS "# of grocery visits",
     count(t3.user_id) AS "# of fishmarket visits"
FROM users t1
LEFT OUTER JOIN grocery t2 ON (t2.user_id=t1."id") 
LEFT OUTER JOIN fishmarket t3 ON (t3.user_id=t1."id") 
GROUP BY t1.account_balance,t1.id
ORDER BY t1.id

It produces an incorrect results: "1", "12", "12".
But when I try to LEFT JOIN to just one table it produces a correct results for either grocery or fishmarket visits, which are "1", "3", "4".

What am I doing wrong here?
I am using PostgreSQL 9.1.

802

asked Sep 17 '12 17:09

Ryan Bostwick

3 Answers

Joins are processed left to right (unless parentheses dictate otherwise). If you LEFT JOIN (or just JOIN, similar effect) three groceries to one user you get 3 rows (1 x 3). If you then join 4 fishmarkets for the same user, you get 12 (3 x 4) rows, multiplying the previous count in the result, not adding to it, like you may have hoped for.
Thereby multiplying the visits for groceries and fishmarkets alike.

You can make it work like this:

SELECT u.id
     , u.account_balance
     , g.grocery_visits
     , f.fishmarket_visits
FROM   users u
LEFT   JOIN (
   SELECT user_id, count(*) AS grocery_visits
   FROM   grocery
   GROUP  BY user_id
   ) g ON g.user_id = u.id
LEFT   JOIN (
   SELECT user_id, count(*) AS fishmarket_visits
   FROM   fishmarket
   GROUP  BY user_id
   ) f ON f.user_id = u.id
ORDER  BY u.id;

To get aggregated values for one or few users, correlated subqueries like @Vince provided are just fine. For a whole table or major parts of it, it is (much) more efficient to aggregate the n-tables and join to the result once. This way, we also do not need another GROUP BY in the outer query.

grocery_visits and fishmarket_visits are NULL for users without any related entries in the respective tables. If you need 0 instead (or any arbitrary number), use COALESCE in the outer SELECT:

SELECT u.id
     , u.account_balance
     , COALESCE(g.grocery_visits   , 0) AS grocery_visits
     , COALESCE(f.fishmarket_visits, 0) AS fishmarket_visits
FROM   ...

117

answered Nov 08 '22 12:11

Erwin Brandstetter

For your original query, if you take away the group by to look at the pre-grouped result, you'll see why the counts your were receiving were created.

Perhaps the following query utilizing subqueries would achieve your intended result:

SELECT
 t1."id" AS "User ID",
 t1.account_balance AS "Account Balance",
 (SELECT count(*) FROM grocery     t2 ON (t2.user_id=t1."id")) AS "# of grocery visits",
 (SELECT count(*) FROM fishmarket  t3 ON (t3.user_id=t1."id")) AS "# of fishmarket visits"
FROM users t1
ORDER BY t1.id

answered Nov 08 '22 13:11

Vince Perta

It's because when the user table joins to the grocery table, there are 3 records matched. Then each of those three records matches with the 4 records in fishmarket, producing 12 records. You need subqueries to get what you are looking for.

answered Nov 08 '22 14:11

Tobsey

Related questions
                            
                                How to perform grouped ranking in MySQL
                            
                                Besides a declarative language, is SQL a functional language?
                            
                                SELECT with a Replace()
                            
                                Oracle create table using with clause
                            
                                Alter table to modify default value of column
                            
                                "column not allowed here" error in INSERT statement
                            
                                Using a Variable in OPENROWSET Query
                            
                                How do I use T-SQL's Exists keyword?
                            
                                SQL Server : Transpose rows to columns
                            
                                MySQL: Count records from one table and then update another
                            
                                count without group
                            
                                Cannot access SqlTransaction object to rollback in catch block
                            
                                Rails query through association limited to most recent record?
                            
                                Postgresql trigger function with parameters
                            
                                SQL LEFT JOIN return 0 rather than NULL
                            
                                Linux alternatives to Sequel Pro? (GUI based sql navigator) [closed]
                            
                                No indexes on small tables?
                            
                                Selecting most recent date between two columns
                            
                                How can I avoid NULLs in my database, while also representing missing data?
                            
                                Multiple inner joins with multiple tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Two SQL LEFT JOINS produce incorrect result

Tags:

sql

postgresql

left-join

aggregate-functions