Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres complicated full outer join keeping nulls from the "on" column

I have written a PostgresSQL query that is relatively performant at scale and gives me the dataset I want back, but I am wondering if it is the simplest/best way to write the query. It seems like there should be a simpler join operation that satisfies the conditions I need.

EDIT: I do need this to be performant on large tables. In the example given below, pets is 150 million rows, food is roughly 100k rows. My solution at the bottom clocks in at about 0.6ms. Both tables have an index on id and user_id. The food table also includes an index on pet_id.

I have two tables that are related in my system that have one guaranteed shared attribute - the user_id. Here is an example that in essence shows my problem:

Pets

+------+-------+---------+
|  id  | type  | user_id |
+------+-------+---------+
| 1234 | dog   |       1 |
| 1235 | cat   |       1 |
| 1236 | gecko |       1 |
+------+-------+---------+

Food

+------+-----------+---------+--------+
|  id  |   name    | user_id | pet_id |
+------+-----------+---------+--------+
| 4321 | hamburger |       1 | NULL   |
| 4322 | dog food  |       1 | 1234   |
| 4323 | cat food  |       1 | 1235   |
+------+-----------+---------+--------+

Desired Results

+------+------+
| p.id | f.id |
+------+------+
| NULL | 4321 |  --no pet, hamburger
| 1234 | 4322 |  --dog, dog food
| 1235 | 4323 |  --cat, cat food
| 1236 | NULL |  --gecko, no food
+------+------+

Now with an example to refer to, I'll make sure it's clear what the result is. The result contains all rows from both sides that belong to my user_id (imagine that the table could contain thousands of other rows that don't belong to user_id 1). I want these result rows to include exactly ONE copy of each row matched to the other table.

An example of a full outer join that I tried to make this work:

SELECT p.id, f.id
FROM pets p FULL OUTER JOIN food f ON p.user_id = f.user_id
WHERE p.user_id = 1;

There's a bit of a problem in this query because

  1. It excludes NULLs from the left side of the query. I need those.
  2. Because the user_id is essentially the constant here, I end up with plenty of duplicates because it matches on user_id. Every row from the left gets matched to every row from the right. Not what I need. I need a one-to-one match.

I could fix #1 by including an OR in the WHERE filter:

SELECT p.id, f.id
FROM pets p FULL OUTER JOIN food f ON p.user_id = f.user_id
WHERE p.user_id = 1 OR f.user_id = 1;

For reasons I'm not completely sure of, it makes the query take a very long time. In our system, both tables have an index on user_id, so it isn't the lack of an index.

To solve my issue, I landed on the following query (really two combined):

SELECT p.id, f.id
    FROM pets p LEFT JOIN food f
        ON p.id = f.pet_id AND f.user_id = 1
    WHERE p.user_id = 1
UNION
SELECT p.id, f.id FROM pets p RIGHT JOIN food f
        ON p.id = f.pet_id
    WHERE f.user_id = 1 AND p.id IS NULL;

So my question is this: Is there a simpler way to execute this as a single query?

like image 292
Justin R. Avatar asked Feb 04 '26 05:02

Justin R.


1 Answers

SQL DEMO

SELECT p.id, f.id
FROM pets p 
FULL OUTER JOIN food f 
  ON p.user_id = f.user_id
 AND p.id = f.pet_id
 AND p.user_id = 1;

OUTPUT

|     id |     id |
|--------|--------|
|   1234 |   4322 |
|   1235 |   4323 |
|   1236 | (null) |
| (null) |   4321 |

NOTE:

You should add a composite index on (user_id, pet_id) for both tables.

like image 180
Juan Carlos Oropeza Avatar answered Feb 05 '26 18:02

Juan Carlos Oropeza