I have stumbled upon very weird issue with my SQL functions. They appear to have different execution plans between function language SQL
and language plpgsql
but I cannot tell what execution plan is set for SQL
version, since it requires this: Function's final statement must be SELECT or INSERT/UPDATE/DELETE RETURNING.
and won't let me use EXPLAIN
.
As for why I know they have different plans, it is because SQL
version fails to execute, complaining it cannot connect to one of the foreign servers that is currently taken down. Connection is done using foreign tables and that table is partitioned by date (column date_col
) with some of its partitions being physically on the same server and some on foreign. Date parameter used in function makes sure it should only scan one partition and that partition is on the same server. This is also shown in explain
below used on plain SQL
(not in function):
Append (cost=2.77..39.52 rows=2 width=36)
CTE ct
-> Result (cost=0.00..0.51 rows=100 width=4)
InitPlan 2 (returns $1)
-> Aggregate (cost=2.25..2.26 rows=1 width=32)
-> CTE Scan on ct (cost=0.00..2.00 rows=100 width=4)
-> Seq Scan on table1 (cost=0.00..0.00 rows=1 width=36)
Filter: ((date_col = '2017-07-30'::date) AND (some_col = ANY ($1)))
-> Seq Scan on "part$_table1_201707" (cost=0.00..36.75 rows=1 width=36)
Filter: ((date_col = '2017-07-30'::date) AND (some_col = ANY ($1)))
Foreign partitions are before year 2017 and it shows that planner chooses correct partition and does not bother scanning any others. This is true for plain SQL
and plpgsql function
but not for sql function
. Why could that be and can I avoid it without rewriting my functions?
From what I figured, there must be some difference between how parameters are passed in SQL function
, since hard coding date in it prevents query from scanning unnecessary partitions. Maybe something like that happens:
WITH ct AS (SELECT unnest(array[1,2]) AS arr)
SELECT col1, col2
FROM table1
WHERE date_col = (SELECT '2017-07-30'::date)
AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[])
Producing such EXPLAIN
:
Append (cost=2.78..183.67 rows=3 width=36)
CTE ct
-> Result (cost=0.00..0.51 rows=100 width=4)
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.01 rows=1 width=4)
InitPlan 3 (returns $2)
-> Aggregate (cost=2.25..2.26 rows=1 width=32)
-> CTE Scan on ct (cost=0.00..2.00 rows=100 width=4)
-> Seq Scan on table1 (cost=0.00..0.00 rows=1 width=36)
Filter: ((date_col = $1) AND (some_col = ANY ($2)))
-> Seq Scan on "part$_table1_201707" (cost=0.00..36.75 rows=1 width=36)
Filter: ((date_col = $1) AND (some_col = ANY ($2)))
-> Foreign Scan on "part$_table1_201603" (cost=100.00..144.14 rows=1 width=36)
For the reference, you can reproduce issue on PostgreSQL 9.6.4 using code below:
CREATE SERVER broken_server FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'broken_server', dbname 'postgres',
port '5432');
CREATE USER MAPPING FOR postgres SERVER broken_server
OPTIONS (user 'foreign_username', password 'foreign_password');
CREATE TABLE table1 (id serial PRIMARY KEY, date_col date,
some_col int, col1 int, col2 text);
CREATE TABLE part$_table1_201707 ()
INHERITS (table1);
ALTER TABLE part$_table1_201707 ADD CONSTRAINT part$_table1_201707_date_chk
CHECK (date_col BETWEEN '2017-07-01'::date AND '2017-07-31'::date);
CREATE FOREIGN TABLE part$_table1_201603 ()
INHERITS (table1) SERVER broken_server
OPTIONS (schema_name 'public', table_name 'part$_table1_201603');
ALTER TABLE part$_table1_201603 ADD CONSTRAINT part$_table1_201603_date_chk
CHECK (date_col BETWEEN '2016-03-01'::date AND '2016-03-31'::date);
CREATE OR REPLACE FUNCTION function_plpgsql(param1 date, param2 int[])
RETURNS TABLE(col1 int, col2 text)
LANGUAGE plpgsql
SECURITY DEFINER
AS $function$
BEGIN
--
RETURN QUERY
WITH ct AS (SELECT unnest(param2) AS arr)
SELECT t.col1, t.col2
FROM table1 AS t
WHERE date_col = param1
AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[]); --reasons
--
END;
$function$;
CREATE OR REPLACE FUNCTION function_sql(param1 date, param2 int[])
RETURNS TABLE(col1 int, col2 text)
LANGUAGE SQL
SECURITY DEFINER
AS $function$
--
WITH ct AS (SELECT unnest(param2) AS arr)
SELECT t.col1, t.col2
FROM table1 AS t
WHERE date_col = param1
AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[])
--
$function$;
CREATE OR REPLACE FUNCTION function_sql_hardcoded(param1 date, param2 int[])
RETURNS TABLE(col1 int, col2 text)
LANGUAGE SQL
SECURITY DEFINER
AS $function$
--
WITH ct AS (SELECT unnest(param2) AS arr)
SELECT t.col1, t.col2
FROM table1 AS t
WHERE date_col = '2017-07-30'::date
AND some_col = ANY((SELECT array_agg(arr) FROM ct)::int[])
--
$function$;
EXPLAIN ANALYZE
SELECT * FROM function_sql('2017-07-30'::date, array[1,2]);
-- ERROR: could not connect to server "broken_server"
EXPLAIN ANALYZE
SELECT * FROM function_plpgsql('2017-07-30'::date, array[1,2]);
--works
EXPLAIN ANALYZE
SELECT * FROM function_sql_hardcoded('2017-07-30'::date, array[1,2]);
--works, but useless
https://www.postgresql.org/docs/current/static/ddl-partitioning.html
Constraint exclusion only works when the query's WHERE clause contains constants (or externally supplied parameters). For example, a comparison against a non-immutable function such as CURRENT_TIMESTAMP cannot be optimized, since the planner cannot know which partition the function value might fall into at run time.
that would explain scanning unnecessary partitions - plpgsql processes query before giving it to optimyzer I assume, and sql function with constant should work. as well as prepared statement I guess. but comparing attribute value to function parameter is probably not the suitable case :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With