PostgreSQL volatile expressions and subqueries

Question

In Bruce Momjian's blog post Generating Random Data Via SQL he used the following code to generate 5 random strings:

SELECT
(
        SELECT string_agg(x, '')
        FROM (
                SELECT chr(ascii('a') + floor(random() * 26)::integer)
                FROM generate_series(1, 40 + b * 0) as f(g)
        ) AS y(x)
) AS result
FROM generate_series(1,5) as a(b);

              result                  
------------------------------------------
 plwfwcgajxdygfissmxqsywcwiqptytjjppgrvgb
 sjaypirhuoynnvqjdgywfsfphuvzqbbilbhakyhf
 ngtabkjfqibwahlicgisijatliuwgbcuiwujgeox
 mqtnyewalettounachwjjzdrvxbbbpzogscexyfi
 dzcstpsvwpefohwkfxmhnlwteyybxejbdltwamsx
(5 rows)

I wondered why 'b * 0' at line 6 is required. When I removed it, the result changed to 5 exactly similar strings which means Postgres cached the outer select expression (result)!

I could not find how expression caching is working in Postgres. According to the documentation random() function is marked VOLATILE, so, I'd expect any expression depends on it to be volatile too.

How does expression caching work in Postgres? Is it documented anywhere? Why 'b*0' disabled the cache where random() did not?

Update:

To study the issue, I moved 'b * 0' to inside the floor() call to be at same position/level as random():

...
                SELECT chr(ascii('a') + floor(random() * 26 + b * 0)::integer)
                FROM generate_series(1, 40) as s(f)
...

The result is still not cached; different strings.

Update: Another example to show the problem

create sequence seq_test;

SELECT (SELECT nextval('seq_test')) FROM generate_series(1,5);

 ?column? 
----------
        1
        1
        1
        1
        1
(5 rows)

araqnid · Accepted Answer

Well, random() itself is volatile, hence you don't get strings with the same character repeated to the end.

If you look at the plans for the queries with and without b*0 you will see:

With b*0:

 Function Scan on generate_series a  (cost=0.00..37530.00 rows=1000 width=4)
   SubPlan 1
     ->  Aggregate  (cost=37.51..37.52 rows=1 width=32)
           ->  Function Scan on generate_series  (cost=0.01..25.01 rows=1000 width=0)

Without b*0:

 Function Scan on generate_series a  (cost=37.52..47.52 rows=1000 width=0)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=37.50..37.51 rows=1 width=32)
           ->  Function Scan on generate_series  (cost=0.00..25.00 rows=1000 width=0)

If PostgreSQL determines that the inner aggregation is not dependent on a, then it is evaluated once as an InitPlan, and the volatility or not of the expressions within are irrelevant. By introducing the dependency of the subquery on a, i.e. making it a correlated subquery, the evaluation must be redone for each row of a.

PostgreSQL volatile expressions and subqueries

Tags:

postgresql

generate-series

Mohammad Alhashash

1 Answers

araqnid

Recent Activity

Donate For Us

PostgreSQL volatile expressions and subqueries

Tags:

postgresql

generate-series

Mohammad Alhashash

1 Answers

araqnid

Related questions

Recent Activity

Donate For Us