Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't PostgreSQL do this simple FULL JOIN?

Here's a minimal setup with 2 tables a and b each with 3 rows:

CREATE TABLE a (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON a (value);

CREATE TABLE b (
    id SERIAL PRIMARY KEY,
    value TEXT
);
CREATE INDEX ON b (value);

INSERT INTO a (value) VALUES ('x'), ('y'),        (NULL);
INSERT INTO b (value) VALUES        ('y'), ('z'), (NULL);

Here is a LEFT JOIN that works fine as expected:

SELECT * FROM a
LEFT JOIN b ON a.value IS NOT DISTINCT FROM b.value;

with output:

 id | value | id | value 
----+-------+----+-------
  1 | x     |    | 
  2 | y     |  1 | y
  3 |       |  3 | 
(3 rows)

Changing "LEFT JOIN" to "FULL JOIN" gives an error:

SELECT * FROM a
FULL JOIN b ON a.value IS NOT DISTINCT FROM b.value;

ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions

Can someone please answer:

What is a "merge-joinable or hash-joinable join condition" and why joining on a.value IS NOT DISTINCT FROM b.value doesn't fulfill this condition, but a.value = b.value is perfectly fine?

It seems that the only difference is how NULL values are handled. Since the value column is indexed in both tables, running an EXPLAIN on a NULL lookup is just as efficient as looking up values that are non-NULL:

EXPLAIN SELECT * FROM a WHERE value = 'x';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.67 rows=6 width=36)
   Recheck Cond: (value = 'x'::text)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value = 'x'::text)


EXPLAIN SELECT * FROM a WHERE value ISNULL;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Bitmap Heap Scan on a  (cost=4.20..13.65 rows=6 width=36)
   Recheck Cond: (value IS NULL)
   ->  Bitmap Index Scan on a_value_idx  (cost=0.00..4.20 rows=6 width=0)
         Index Cond: (value IS NULL)

This has been tested with PostgreSQL 9.6.3 and 10beta1.

There has been discussion about this issue, but it doesn't directly answer the above question.

like image 852
Matt Avatar asked May 28 '17 20:05

Matt


People also ask

Does PostgreSQL support full join?

PostgreSQL supports inner join, left join, right join, full outer join, cross join, natural join, and a special kind of join called self-join.

Does Postgres have full outer join?

The PostgreSQL FULL OUTER JOIN or FULL JOIN creates the result-set by combining the result of both LEFT JOIN and RIGHT JOIN. The result-set will contain all the rows from both the tables. The rows for which there is no matching, the result-set will contain NULL values.

How do I join two queries in PostgreSQL?

To combine the result sets of two queries using the UNION operator, the queries must conform to the following rules: The number and the order of the columns in the select list of both queries must be the same. The data types must be compatible.


2 Answers

PostgreSQL implements FULL OUTER JOIN with either a hash or a merge join.

To be eligible for such a join, the join condition has to have the form

<expression using only left table> <operator> <expression using only right table>

Now your join condition does look like this, but PostgreSQL does not have a special IS NOT DISTINCT FROM operator, so it parses your condition into:

(NOT ($1 IS DISTINCT FROM $2))

And such an expression cannot be used for hash or merge joins, hence the error message.

I can think of a way to work around it:

SELECT a_id, NULLIF(a_value, '<null>'),
       b_id, NULLIF(b_value, '<null>')
FROM (SELECT id AS a_id,
             COALESCE(value, '<null>') AS a_value
      FROM a
     ) x
   FULL JOIN
     (SELECT id AS b_id,
             COALESCE(value, '<null>') AS b_value
      FROM b
     ) y
      ON x.a_value = y.b_value;

That works if <null> does not appear anywhere in the value columns.

like image 126
Laurenz Albe Avatar answered Nov 01 '22 16:11

Laurenz Albe


I just solved such a case by replacing the ON condition with "TRUE", and moving the original "ON" condition into a WHERE clause. I don't know the performance impact of this, though.

like image 34
David Avatar answered Nov 01 '22 18:11

David