Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the expected behaviour for multiple set-returning functions in SELECT clause?

I'm trying to get a "cross join" with the result of two set-returning functions, but in some cases I don't get the "cross join", see example

Behaviour 1: When set lenghts are the same, it matches item by item from each set

postgres=# SELECT generate_series(1,3), generate_series(5,7) order by 1,2;
 generate_series | generate_series 
-----------------+-----------------
               1 |               5
               2 |               6
               3 |               7
(3 rows)

Behaviour 2: When set lenghts are different, it "cross join"s the sets

postgres=# SELECT generate_series(1,2), generate_series(5,7) order by 1,2;
 generate_series | generate_series 
-----------------+-----------------
               1 |               5
               1 |               6
               1 |               7
               2 |               5
               2 |               6
               2 |               7
(6 rows)

I think I'm not understanding something here, can someone explain the expeted behaviour?

Another example, even weirder:

postgres=# SELECT generate_series(1,2) x, generate_series(1,4) y order by x,y;
 x | y 
---+---
 1 | 1
 1 | 3
 2 | 2
 2 | 4
(4 rows)

I am looking for an answer to the question in the title, ideally with link(s) to documentation.

like image 898
jperelli Avatar asked Oct 05 '16 00:10

jperelli


3 Answers

Postgres 10 or newer

pads with null values for smaller set(s). Demo with generate_series():

SELECT generate_series( 1,  2) AS row2
     , generate_series(11, 13) AS row3
     , generate_series(21, 24) AS row4;
row2 | row3 | row4
-----+------+-----
   1 |   11 |   21
   2 |   12 |   22
null |   13 |   23
null | null |   24

dbfiddle here

The manual for Postgres 10:

If there is more than one set-returning function in the query's select list, the behavior is similar to what you get from putting the functions into a single LATERAL ROWS FROM( ... ) FROM-clause item. For each row from the underlying query, there is an output row using the first result from each function, then an output row using the second result, and so on. If some of the set-returning functions produce fewer outputs than others, null values are substituted for the missing data, so that the total number of rows emitted for one underlying row is the same as for the set-returning function that produced the most outputs. Thus the set-returning functions run “in lockstep” until they are all exhausted, and then execution continues with the next underlying row.

This ends the traditionally odd behavior.

Some other details changed with this rewrite. The release notes:

  • Change the implementation of set-returning functions appearing in a query's SELECT list (Andres Freund)

    Set-returning functions are now evaluated before evaluation of scalar expressions in the SELECT list, much as though they had been placed in a LATERAL FROM-clause item. This allows saner semantics for cases where multiple set-returning functions are present. If they return different numbers of rows, the shorter results are extended to match the longest result by adding nulls. Previously the results were cycled until they all terminated at the same time, producing a number of rows equal to the least common multiple of the functions' periods. In addition, set-returning functions are now disallowed within CASE and COALESCE constructs. For more information see Section 37.4.8.

Bold emphasis mine.

Postgres 9.6 or older

The number of result rows (somewhat surprisingly!) is the lowest common multiple of all sets in the same SELECT list. (Only acts like a CROSS JOIN if there is no common divisor to all set-sizes!) Demo:

SELECT generate_series( 1,  2) AS row2
     , generate_series(11, 13) AS row3
     , generate_series(21, 24) AS row4;
row2 | row3 | row4
-----+------+-----
   1 |   11 |   21
   2 |   12 |   22
   1 |   13 |   23
   2 |   11 |   24
   1 |   12 |   21
   2 |   13 |   22
   1 |   11 |   23
   2 |   12 |   24
   1 |   13 |   21
   2 |   11 |   22
   1 |   12 |   23
   2 |   13 |   24

dbfiddle here

Documented in manual for Postgres 9.6 the chapter SQL Functions Returning Sets, along with the recommendation to avoid it:

Note: The key problem with using set-returning functions in the select list, rather than the FROM clause, is that putting more than one set-returning function in the same select list does not behave very sensibly. (What you actually get if you do so is a number of output rows equal to the least common multiple of the numbers of rows produced by each set-returning function.) The LATERAL syntax produces less surprising results when calling multiple set-returning functions, and should usually be used instead.

Bold emphasis mine.

A single set-returning function is OK (but still cleaner in the FROM list), but multiple in the same SELECT list is discouraged now. This was a useful feature before we had LATERAL joins. Now it's merely historical ballast.

Related:

  • Parallel unnest() and sort order in PostgreSQL
  • Unnest multiple arrays in parallel
  • What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
like image 166
Erwin Brandstetter Avatar answered Nov 19 '22 21:11

Erwin Brandstetter


I cannot find any documentation for this. However, I can describe the behavior that I observe.

The set generating functions each return a finite number of rows. Postgres seems to run the set generating functions until all of them are on their last row -- or, more likely stop when all are back to their first rows. Technically, this would be the least common multiple (LCM) of the series lengths.

I'm not sure why this is the case. And, as I say in a comment, I think it is better to generally put the functions in the from clause.

like image 29
Gordon Linoff Avatar answered Nov 19 '22 22:11

Gordon Linoff


There is the only note about the issue in the documentation. I'm not sure whether this explains the described behavior or not. Perhaps more important is that such function usage is deprecated:

Currently, functions returning sets can also be called in the select list of a query. For each row that the query generates by itself, the function returning set is invoked, and an output row is generated for each element of the function's result set. Note, however, that this capability is deprecated and might be removed in future releases.

like image 1
klin Avatar answered Nov 19 '22 20:11

klin