Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does PostgreSQL treat my query differently in a function?

I have a very simple query that is not much more complicated than:

select *
from table_name
where id = 1234

...it takes less than 50 milliseconds to run.

Took that query and put it into a function:

CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN
    RETURN QUERY SELECT *
         FROM table_name
         where id = id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;

This function when executed select * from pie(123); takes 22 seconds.

If I hard code an integer in place of id_param, the function executes in under 50 milliseconds.

Why does the fact that I am using a parameter in the where statement cause my function to run slow?


Edit to add concrete example:

CREATE TYPE test_type AS (gid integer, geocode character varying(9))

CREATE OR REPLACE FUNCTION geocode_route_by_geocode(geocode_param character)
  RETURNS SETOF test_type AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
    'SELECT     gs.geo_shape_id AS gid,     
        gs.geocode
    FROM geo_shapes gs
    WHERE geocode = $1
    AND geo_type = 1 
    GROUP BY geography, gid, geocode' USING geocode_param;
END;

$BODY$
  LANGUAGE plpgsql STABLE;
ALTER FUNCTION geocode_carrier_route_by_geocode(character)
  OWNER TO root;

--Runs in 20 seconds
select * from geocode_route_by_geocode('999xyz');

--Runs in 10 milliseconds
SELECT  gs.geo_shape_id AS gid,     
        gs.geocode
    FROM geo_shapes gs
    WHERE geocode = '9999xyz'
    AND geo_type = 1 
    GROUP BY geography, gid, geocode
like image 977
Steve Horn Avatar asked Feb 22 '23 11:02

Steve Horn


1 Answers

Update in PostgreSQL 9.2

There was a major improvement, I quote the release notes here:

Allow the planner to generate custom plans for specific parameter values even when using prepared statements (Tom Lane)

In the past, a prepared statement always had a single "generic" plan that was used for all parameter values, which was frequently much inferior to the plans used for non-prepared statements containing explicit constant values. Now, the planner attempts to generate custom plans for specific parameter values. A generic plan will only be used after custom plans have repeatedly proven to provide no benefit. This change should eliminate the performance penalties formerly seen from use of prepared statements (including non-dynamic statements in PL/pgSQL).


Original answer for PostgreSQL 9.1 or older

A plpgsql functions has a similar effect as the PREPARE statement: queries are parsed and the query plan is cached.

The advantage is that some overhead is saved for every call.
The disadvantage is that the query plan is not optimized for the particular parameter values it is called with.

For queries on tables with even data distribution, this will generally be no problem and PL/pgSQL functions will perform somewhat faster than raw SQL queries or SQL functions. But if your query can use certain indexes depending on the actual values in the WHERE clause or, more generally, chose a better query plan for the particular values, you may end up with a sub-optimal query plan. Try an SQL function or use dynamic SQL with EXECUTE to force a the query to be re-planned for every call. Could look like this:

CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN        
    RETURN QUERY EXECUTE
        'SELECT *
         FROM   table_name
         where  id = $1'
    USING id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;

Edit after comment:

If this variant does not change the execution time, there must be other factors at play that you may have missed or did not mention. Different database? Different parameter values? You would have to post more details.

I add a quote from the manual to back up my above statements:

An EXECUTE with a simple constant command string and some USING parameters, as in the first example above, is functionally equivalent to just writing the command directly in PL/pgSQL and allowing replacement of PL/pgSQL variables to happen automatically. The important difference is that EXECUTE will re-plan the command on each execution, generating a plan that is specific to the current parameter values; whereas PL/pgSQL normally creates a generic plan and caches it for re-use. In situations where the best plan depends strongly on the parameter values, EXECUTE can be significantly faster; while when the plan is not sensitive to parameter values, re-planning will be a waste.

like image 122
Erwin Brandstetter Avatar answered Mar 06 '23 03:03

Erwin Brandstetter