Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When are (SELECT) queries planned?

In PostgreSQL, when are (SELECT) queries planned?

Is it:

  1. at statement-prepare time, or
  2. at the start of processing the SELECT, or
  3. something else

The reason I ask is that there is a Stackoverflow question: same query, two different ways, vastly different performance

A lot of people seem to be thinking that the query is planned differently because in one case the query contains a string literal ('foo') and in another case it's a placeholder (?).

Now my thinking is that this is a red herring, because the query isn't planned at statement-prepare time, but is actually planned at SELECT time.

So, say, I could prepare a statement with a placeholder, then run the query multiple times with different bound values, and the query planner will be run for each different bound value.

I suspect that the question linked above boils down to the PostgreSQL data type of the value, which in the case of a 'foo' literal is known to be a string, but in the case of a placeholder, the type can't be divined, so is coming through to the query planner as some strange type, which it can't create an efficient plan for. In which case, the issue is not that the query is being planned differently because the value is a placeholder (at statement preparation time) per se but that the value is coming through to the query as a different PostgreSQL type, and that is what is influencing the query planner. To fix this would simply be a matter of binding the placeholder with an appropriate explicit type declaration.

like image 947
zgpmax Avatar asked Feb 14 '12 12:02

zgpmax


1 Answers

I cannot talk about the client-side Perl interface itself but I can shed some light on the PostgreSQL server side.

PostgreSQL has prepared statements and unprepared statements. Unprepared statements are parsed, planned and executed immediately. They also do not support parameter substitution. On a plain psql shell you can show their query plan like this:

tmpdb> explain select * from sometable where flag = true;

On the other hand there are prepared statements: They are usually (see "exception" below) parsed and planned in one step and executed in a second step. They can be re-executed several times with different parameters, because they do support parameter substitution. The equivalent in psql is this:

tmpdb> prepare foo as select * from sometable where flag = $1;
tmpdb> explain execute foo(true);

You may see, that the plan is different from the plan in the unprepared statement, because planning did take place already in the prepare phase as described in the doc for PREPARE:

When the PREPARE statement is executed, the specified statement is parsed, rewritten, and planned. When an EXECUTE command is subsequently issued, the prepared statement need only be executed. Thus, the parsing, rewriting, and planning stages are only performed once, instead of every time the statement is executed.

This also means, that the plan is NOT optimized for the substituted parameters: In the first examples might use an index for flag because PostgreSQL knows that within a million entries only ten have the value true. This reasoning is impossible when PostgreSQL uses a prepared statement. In that case a plan is created which will work for all possible parameter values as good as possible. This might exclude the mentioned index because fetching the better part of the complete table via random access (due to the index) is slower than a plain sequential scan. The PREPARE doc confirms this:

In some situations, the query plan produced for a prepared statement will be inferior to the query plan that would have been chosen if the statement had been submitted and executed normally. This is because when the statement is planned and the planner attempts to determine the optimal query plan, the actual values of any parameters specified in the statement are unavailable. PostgreSQL collects statistics on the distribution of data in the table, and can use constant values in a statement to make guesses about the likely result of executing the statement. Since this data is unavailable when planning prepared statements with parameters, the chosen plan might be suboptimal.

BTW - Regarding plan caching the PREPARE doc also has something to say:

Prepared statements only last for the duration of the current database session. When the session ends, the prepared statement is forgotten, so it must be recreated before being used again.

Also there is no automatic plan caching and no caching/reuse over multiple connections.

EXCEPTION: I have mentioned "usually". The shown psql examples are not the stuff a client adapter like Perl DBI really uses. It uses a certain protocol. Here the term "simple query" corresponds to the "unprepared query" in psql, the term "extended query" corresponds to "prepared query" with one exception: There is a distinction between (one) "unnamed statement" and (possibly multiple) "named statements". Regarding named statements the doc says:

Named prepared statements can also be created and accessed at the SQL command level, using PREPARE and EXECUTE.

and also:

Query planning for named prepared-statement objects occurs when the Parse message is processed.

So in this case planning is done without parameters as described above for PREPARE - nothing new.

The mentioned exception is the "unnamed statement". The doc says:

The unnamed prepared statement is likewise planned during Parse processing if the Parse message defines no parameters. But if there are parameters, query planning occurs every time Bind parameters are supplied. This allows the planner to make use of the actual values of the parameters provided by each Bind message, rather than use generic estimates.

And here is the benefit: Although the unnamed statement is "prepared" (i.e. can have parameter substitution), it also can adapt the query plan to the actual parameters.

BTW: The exact handling of the unnamed statement has changed several times in the past releases of the PostgreSQL server. You can lookup the old docs for details if you really want.

Rationale - Perl / any client:

How a client like Perl uses the protocol is a completely different question. Some clients like the JDBC driver for Java basically say: Even if the programmer uses a prepared statement, the first five (or so) executions are internally mapped to a "simple query" (i.e. effectively unprepared), after that the driver switches to "named statement".

So a client has these choices:

  • Force (re)planning each time by using the "simple query" protocol.
  • Plan once, execute multiple times by using the "extended query" protocol and the "named statement" (plan might be bad because planning is done without parameters).
  • Parse once, plan for each execution (with current PostgreSQL version) by using the "extended query" protocol and the "unnamed statement" and obeying some more things (provide some params during "parse" message)
  • Play completely different tricks like the JDBC driver.

What Perl does currently: I don't know. But the mentioned "red herring" is not very unlikely.

like image 179
A.H. Avatar answered Nov 08 '22 11:11

A.H.