In Bruce Momjian's blog post Generating Random Data Via SQL he used the following code to generate 5 random strings:
SELECT
(
SELECT string_agg(x, '')
FROM (
SELECT chr(ascii('a') + floor(random() * 26)::integer)
FROM generate_series(1, 40 + b * 0) as f(g)
) AS y(x)
) AS result
FROM generate_series(1,5) as a(b);
result
------------------------------------------
plwfwcgajxdygfissmxqsywcwiqptytjjppgrvgb
sjaypirhuoynnvqjdgywfsfphuvzqbbilbhakyhf
ngtabkjfqibwahlicgisijatliuwgbcuiwujgeox
mqtnyewalettounachwjjzdrvxbbbpzogscexyfi
dzcstpsvwpefohwkfxmhnlwteyybxejbdltwamsx
(5 rows)
I wondered why 'b * 0' at line 6 is required. When I removed it, the result changed to 5 exactly similar strings which means Postgres cached the outer select expression (result)!
I could not find how expression caching is working in Postgres. According to the documentation random() function is marked VOLATILE, so, I'd expect any expression depends on it to be volatile too.
How does expression caching work in Postgres? Is it documented anywhere? Why 'b*0' disabled the cache where random() did not?
Update:
To study the issue, I moved 'b * 0' to inside the floor() call to be at same position/level as random():
...
SELECT chr(ascii('a') + floor(random() * 26 + b * 0)::integer)
FROM generate_series(1, 40) as s(f)
...
The result is still not cached; different strings.
Update: Another example to show the problem
create sequence seq_test;
SELECT (SELECT nextval('seq_test')) FROM generate_series(1,5);
?column?
----------
1
1
1
1
1
(5 rows)
Well, random()
itself is volatile, hence you don't get strings with the same character repeated to the end.
If you look at the plans for the queries with and without b*0
you will see:
With b*0
:
Function Scan on generate_series a (cost=0.00..37530.00 rows=1000 width=4)
SubPlan 1
-> Aggregate (cost=37.51..37.52 rows=1 width=32)
-> Function Scan on generate_series (cost=0.01..25.01 rows=1000 width=0)
Without b*0
:
Function Scan on generate_series a (cost=37.52..47.52 rows=1000 width=0)
InitPlan 1 (returns $0)
-> Aggregate (cost=37.50..37.51 rows=1 width=32)
-> Function Scan on generate_series (cost=0.00..25.00 rows=1000 width=0)
If PostgreSQL determines that the inner aggregation is not dependent on a
, then it is evaluated once as an InitPlan
, and the volatility or not of the expressions within are irrelevant. By introducing the dependency of the subquery on a
, i.e. making it a correlated subquery, the evaluation must be redone for each row of a
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With