Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Smart logic queries performance inside functions for PostgreSQL

Consider the following sql query:

SELECT a,b,c
FROM t
WHERE (id1 = :p_id1 OR :p_id1 IS NULL) AND (id2 = :p_id2 OR :p_id2 IS NULL)

Markus Winand in his book "SQL Performance explained" names this approach as one of the worst performance anti-patterns of all, and explains why (the database has to prepare plan for the worst case when all filters are disabled).

But later he also writes that for the PostgreSQL this problem occurs only when re-using a statement (PreparedStatement) handle.

Assume also now that query above is wrapped into the function, something like:

CREATE FUNCTION func(IN p_id1 BIGINT,IN p_id2 BIGINT)
...
 $BODY$
  BEGIN
     ...
  END;
 $BODY$

So far I have misunderstanding of few points:

  1. Will this problem still occur in case of function wrapping? (I've tried to see the execution plan for the function call, but Postgres doesn't show me the details for the internal function calls even with SET auto_explain.log_nested_statements = ON).

  2. Let's say I'm working with legacy project and can not change the function itself, only java execution code. Will it be better to avoid prepared statement here and use dynamic query each time? (Assuming that execution time is quite long, up to several seconds). Say this, probably, ugly approach:


getSession().doWork(connection -> {
    ResultSet rs = connection.createStatement().executeQuery("select * from func("+id1+","+id2+")");
    ...
})
like image 992
Andremoniy Avatar asked Oct 30 '22 08:10

Andremoniy


1 Answers

1. It depends.

When not using prepared statements, PostgreSQL plans a query every time anew, using parameters values. It is known as custom plan.

With prepared statements (and you're right, PL/pgSQL functions do use prepared statements) it's more complicated. PostgreSQL prepares the statement (parses its text and stores parse tree), but re-plans it each time it is executed. Custom plans are generated at least 5 times. After that the planner considers using a generic plan (i. e. parameter-value-independent) if it's cost is less than the average cost of custom plans generated so far.

Note, that cost of a plan is an estimation of the planner, not real I/O operations or CPU cycles.

So, the problem can occur, but you need some bad luck for that.

2. The approach you suggested will not work, because it doesn't change behavior of the function.

In general it is not so ugly for PostgreSQL not to use parameters (as it is for e. g. Oracle), because PostgreSQL doesn't have shared cache for plans. Prepared plans are stored in each backend's memory, so re-planning will not affect other sessions.

But as far as I know, currently there is no way to force planner to use custom plans (other than reconnect after 5 executions...).

like image 140
Egor Rogov Avatar answered Nov 15 '22 05:11

Egor Rogov