I'm working with two PostgreSQL 9.6 databases and am trying to query one of the DB's from the other using postgres_fdw (one is a production backup DB that has the data and the other is a db for doing various analyses).
I've come across some odd behavior though where certain types of WHERE clauses in a query aren't being passed to the remote DB but instead are retained in the local DB and used to filter the results received from the remote DB. This is causing the remote DB to try to send way more info than the local DB needs across the network and the affected queries are dramatically slower (15 seconds vs 15 minutes).
I've mostly seen this with timestamp related clauses, the below examples are how I first came across the problem, but I've seen it in several other variations such as replacing CURRENT_TIMESTAMP with a TIMESTAMP literal (slow) or TIMESTAMP WITH TIME ZONE literal (fast).
Is there a setting somewhere that I'm missing that'll help with this? I'm setting this up for a team to use with a mixed level of SQL backgrounds, most aren't experienced with reviewing EXPLAIN plans and whatnot. I've come up with some workarounds (such as putting the relative time clauses in sub-SELECT), but I keep coming across new instances of the problem.
An example:
SELECT var_1
,var_2
FROM schema_A.table_A
WHERE execution_ts <= CURRENT_TIMESTAMP - INTERVAL '1 hour'
AND execution_ts >= CURRENT_TIMESTAMP - INTERVAL '1 week' - INTERVAL '1 hour'
ORDER BY var_1
Explain Plan
Sort (cost=147.64..147.64 rows=1 width=1048)
Output: table_A.var_1, table_A.var_2
Sort Key: (table_A.var_1)::text
-> Foreign Scan on schema_A.table_A (cost=100.00..147.63 rows=1 width=1048)
Output: table_A.var_1, table_A.var_2
Filter: ((table_A.execution_ts <= (now() - '01:00:00'::interval))
AND (table_A.execution_ts >= ((now() - '7 days'::interval) - '01:00:00'::interval)))
Remote SQL: SELECT var_1, execution_ts FROM model.table_A
WHERE ((model_id::text = 'ABCD'::text))
AND ((var_1 = ANY ('{1,2,3,4,5}'::bigint[])))
The above takes around 15-20 minutes to run, while the below completes in seconds.
SELECT var_1
,var_2
FROM schema_A.table_A
WHERE execution_ts <= (SELECT CURRENT_TIMESTAMP - INTERVAL '1 hour')
AND execution_ts >= (SELECT CURRENT_TIMESTAMP - INTERVAL '1 week' - INTERVAL '1 hour')
ORDER BY var_1
Explain Plan
Sort (cost=158.70..158.71 rows=1 width=16)
Output: table_A.var_1, table_A.var_2
Sort Key: table_A.var_1
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=8)
Output: (now() - '01:00:00'::interval)
InitPlan 2 (returns $1)
-> Result (cost=0.00..0.02 rows=1 width=8)
Output: ((now() - '7 days'::interval) - '01:00:00'::interval)
-> Foreign Scan on schema_A.table_A (cost=100.00..158.66 rows=1 width=16)
Output: table_A.var_1, table_A.var_2
Remote SQL: SELECT var_1, var_2 FROM model.table_A
WHERE ((execution_ts <= $1::timestamp with time zone))
AND ((execution_ts >= $2::timestamp with time zone))
AND ((model_id::text = 'ABCD'::text))
AND ((var_1 = ANY ('{1,2,3,4,5}'::bigint[])))
Any function that is not IMMUTABLE
will not be pushed down.
See function is_foreign_expr
in contrib/postgres_fdw/deparse.c
:
/*
* Returns true if given expr is safe to evaluate on the foreign server.
*/
bool
is_foreign_expr(PlannerInfo *root,
RelOptInfo *baserel,
Expr *expr)
{
...
/*
* An expression which includes any mutable functions can't be sent over
* because its result is not stable. For example, sending now() remote
* side could cause confusion from clock offsets. Future versions might
* be able to make this choice with more granularity. (We check this last
* because it requires a lot of expensive catalog lookups.)
*/
if (contain_mutable_functions((Node *) expr))
return false;
/* OK to evaluate on the remote server */
return true;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With