Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL volatile expressions and subqueries

In Bruce Momjian's blog post Generating Random Data Via SQL he used the following code to generate 5 random strings:

SELECT
(
        SELECT string_agg(x, '')
        FROM (
                SELECT chr(ascii('a') + floor(random() * 26)::integer)
                FROM generate_series(1, 40 + b * 0) as f(g)
        ) AS y(x)
) AS result
FROM generate_series(1,5) as a(b);

              result                  
------------------------------------------
 plwfwcgajxdygfissmxqsywcwiqptytjjppgrvgb
 sjaypirhuoynnvqjdgywfsfphuvzqbbilbhakyhf
 ngtabkjfqibwahlicgisijatliuwgbcuiwujgeox
 mqtnyewalettounachwjjzdrvxbbbpzogscexyfi
 dzcstpsvwpefohwkfxmhnlwteyybxejbdltwamsx
(5 rows)

I wondered why 'b * 0' at line 6 is required. When I removed it, the result changed to 5 exactly similar strings which means Postgres cached the outer select expression (result)!

I could not find how expression caching is working in Postgres. According to the documentation random() function is marked VOLATILE, so, I'd expect any expression depends on it to be volatile too.

How does expression caching work in Postgres? Is it documented anywhere? Why 'b*0' disabled the cache where random() did not?

Update:

To study the issue, I moved 'b * 0' to inside the floor() call to be at same position/level as random():

...
                SELECT chr(ascii('a') + floor(random() * 26 + b * 0)::integer)
                FROM generate_series(1, 40) as s(f)
...

The result is still not cached; different strings.

Update: Another example to show the problem

create sequence seq_test;

SELECT (SELECT nextval('seq_test')) FROM generate_series(1,5);

 ?column? 
----------
        1
        1
        1
        1
        1
(5 rows)
like image 612
Mohammad Alhashash Avatar asked Oct 07 '22 03:10

Mohammad Alhashash


1 Answers

Well, random() itself is volatile, hence you don't get strings with the same character repeated to the end.

If you look at the plans for the queries with and without b*0 you will see:

With b*0:

 Function Scan on generate_series a  (cost=0.00..37530.00 rows=1000 width=4)
   SubPlan 1
     ->  Aggregate  (cost=37.51..37.52 rows=1 width=32)
           ->  Function Scan on generate_series  (cost=0.01..25.01 rows=1000 width=0)

Without b*0:

 Function Scan on generate_series a  (cost=37.52..47.52 rows=1000 width=0)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=37.50..37.51 rows=1 width=32)
           ->  Function Scan on generate_series  (cost=0.00..25.00 rows=1000 width=0)

If PostgreSQL determines that the inner aggregation is not dependent on a, then it is evaluated once as an InitPlan, and the volatility or not of the expressions within are irrelevant. By introducing the dependency of the subquery on a, i.e. making it a correlated subquery, the evaluation must be redone for each row of a.

like image 95
araqnid Avatar answered Oct 13 '22 12:10

araqnid