Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does PostgreSQL short-circuit its BOOL_OR() evaluation?

EXISTS is faster than COUNT(*) because it can be short-circuited

A lot of times, I like to check for existence of things in SQL. For instance, I do:

-- PostgreSQL syntax, SQL standard syntax:
SELECT EXISTS (SELECT .. FROM some_table WHERE some_boolean_expression)

-- Oracle syntax
SELECT CASE 
  WHEN EXISTS (SELECT .. FROM some_table WHERE some_boolean_expression) THEN 1 
  ELSE 0 
END
FROM dual

In most databases, EXISTS is "short-circuited", i.e. the database can stop looking for rows in the table as soon as it has found one row. This is usually much faster than comparing COUNT(*) >= 1 as can be seen in this blog post.

Using EXISTS with GROUP BY

Sometimes, I'd like to do this for each group in a GROUP BY query, i.e. I'd like to "aggregate" the existence value. There's no EXISTS aggregate function, but PostgreSQL luckily supports the BOOL_OR() aggregate function, like in this statement:

SELECT something, bool_or (some_boolean_expression)
FROM some_table
GROUP BY something

The documentation mentions something about COUNT(*) being slow because of the obvious sequential scan needed to calculate the count. But unfortunately, it doesn't say anything about BOOL_OR() being short-circuited. Is it the case? Does BOOL_OR() stop aggregating new values as soon as it encounters the first TRUE value?

like image 955
Lukas Eder Avatar asked Sep 26 '16 06:09

Lukas Eder


1 Answers

If you want to check for existence, I'm generally using a LIMIT/FETCH FIRST 1 ROW ONLY query:

SELECT .. FROM some_table WHERE some_boolean_expression
FETCH FIRST 1 ROW ONLY

This generally stops execution after the first hit.

The same technique can be applied using LATERAL for each row (group) from another table.

SELECT * 
  FROM (SELECT something
          FROM some_table
         GROUP BY something
       ) t1
  LEFT JOIN LATERAL (SELECT ...
                        FROM ...
                       WHERE ...
                       FETCH FIRST 1 ROW ONLY) t2
    ON (true)

In t2 you can use a WHERE clause that matches any row for the group. It's executed only once per group and aborted as soon as the first hit was found. However, whether this performs better or worse depends on your search predicates and indexing, of course.

like image 186
Markus Winand Avatar answered Oct 11 '22 09:10

Markus Winand