I'm trying to optimise a complex query in PostgreSQL 9.1.2, which calls some functions. These functions are marked STABLE or IMMUTABLE and are called several times with the same arguments in the query. I assumed PostgreSQL would be smart enough to only call them once for each set of inputs - after all, that's the point of STABLE and IMMUTABLE, isn't it? But it appears that the functions are being called multiple times. I wrote a simple function to test this, which confirms it:
CREATE OR REPLACE FUNCTION test_multi_calls1(one integer)
RETURNS integer
AS $BODY$
BEGIN
RAISE NOTICE 'Called with %', one;
RETURN one;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;
WITH data AS
(
SELECT 10 AS num
UNION ALL SELECT 10
UNION ALL SELECT 20
)
SELECT test_multi_calls1(num)
FROM data;
Output:
NOTICE: Called with 10
NOTICE: Called with 10
NOTICE: Called with 20
Why is this happening and how can I get it to only execute the function once?
An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever. This category allows the optimizer to pre-evaluate the function when a query calls it with constant arguments.
Alternatively, an SQL function can be declared to return a set (that is, multiple rows) by specifying the function's return type as SETOF sometype , or equivalently by declaring it as RETURNS TABLE( columns ) . In this case all rows of the last query's result are returned. Further details appear below.
The following extension of your test code is informative:
CREATE OR REPLACE FUNCTION test_multi_calls1(one integer)
RETURNS integer
AS $BODY$
BEGIN
RAISE NOTICE 'Immutable called with %', one;
RETURN one;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE;
CREATE OR REPLACE FUNCTION test_multi_calls2(one integer)
RETURNS integer
AS $BODY$
BEGIN
RAISE NOTICE 'Volatile called with %', one;
RETURN one;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;
WITH data AS
(
SELECT 10 AS num
UNION ALL SELECT 10
UNION ALL SELECT 20
)
SELECT test_multi_calls1(num)
FROM data
where test_multi_calls2(40) = 40
and test_multi_calls1(30) = 30
OUTPUT:
NOTICE: Immutable called with 30
NOTICE: Volatile called with 40
NOTICE: Immutable called with 10
NOTICE: Volatile called with 40
NOTICE: Immutable called with 10
NOTICE: Volatile called with 40
NOTICE: Immutable called with 20
Here we can see that while in the select-list the immutable function was called multiple times, in the where clause it was called once, while the volatile was called thrice.
The important thing isn't that PostgreSQL will only call a STABLE
or IMMUTABLE
function once with the same data - your example clearly shows that this is not the case - it's that it may call it only once. Or perhaps it will call it twice when it would have to call a volatile version 50 times, and so on.
There are different ways in which stability and immutability can be taken advantage of, with different costs and benefits. To provide the sort of saving you are suggesting it should make with select-lists it would have to cache the results, and then lookup each argument (or list of arguments) in this cache before either returning the cached result or calling function on a cache-miss. This would be more expensive than calling your function, even in the case where there was a high percentage of cache-hits (there could be 0% cache hits meaning this "optimisation" did extra work for absolutely no gain). It could store maybe just the last parameter and result, but again that could be completely useless.
This is especially so considering that stable and immutable functions are often the lightest functions.
With the where clause however, the immutability of test_multi_calls1
allows PostgreSQL to actually restructure the query from the plain meaning of the SQL given:
For every row calculate test_multi_calls1(30) and if the result is equal to 30 continue processing the row in question
To a different query plan entirely:
Calculate test_multi_calls1(30) and if it is equal to 30 then continue with the query otherwise return a zero row result-set without any further calculation
This is the sort of use that PostgreSQL makes of STABLE and IMMUTABLE - not the caching of results, but the rewriting of queries into different queries which are more efficient but give the same results.
Note also that test_multi_calls1(30) is called before test_multi_calls2(40) no matter what order they appear in the where clause. This means that if the first call results in no rows being returned (replace = 30
with = 31
to test) then the volatile function won't be called at all - again regardless to which is on which side of the and
.
This particular sort of rewriting depends upon immutability or stability. With where test_multi_calls1(30) != num
query re-writing will happen for immutable but not for merely stable functions. With where test_multi_calls1(num) != 30
it won't happen at all (multiple calls) though there are other optimisations possible:
Expressions containing only STABLE and IMMUTABLE functions can be used with index scans. Expressions containing VOLATILE functions cannot. The number of calls may or may not decrease, but much more importantly the results of the calls will then be used in a much more efficient way in the rest of the query (only really matters on large tables, but then it can make a massive difference).
In all, don't think of volatility categories in terms of memoisation, but rather in terms of giving PostgreSQL's query planner opportunities to restructure entire queries in ways that are logically equivalent (same results) but much more efficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With