Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Crosstab function in Postgres returning a one row output when I expect multiple rows

I currently have a table m of the following format:

id    scenario    period    ct
2        1          1       1  
2        1          2       1
2        1          3       1
2        1          4       1
2        2          1       1
2        2          2       1
2        2          3       1
2        2          4       1
2        3          1       1
2        3          2       1
2        3          3       1
2        3          4       1

I want to create the following table:

id    scenario    period    1    2    3    4
2        1          1       1
2        1          2            1
2        1          3                 1
2        1          4                      1
2        2          1       1
2        2          2            1
2        2          3                 1
2        2          4                      1
2        3          1       1
2        3          2            1
2        3          3                 1
2        3          4                      1

The tablefunc extension has been created in my Postgres database already. I'm currently trying to use the crosstab() function to complete the pivot. However, I'm getting a table that looks like the following:

id    scenario    period    1    2   3    4 
2        1          1       1    1   1    1

The query I tried:

SELECT * FROM crosstab(
        'SELECT id, scenario, period, ct FROM m 
            ORDER BY 1',
         'SELECT DISTINCT period FROM m 
            ORDER BY 1')
AS (id, scenario, period, 1, 2, 3, 4);
like image 208
Stevie Kobori Avatar asked Oct 03 '15 00:10

Stevie Kobori


1 Answers

This query generates your desired output:

SELECT id, scenario, period, p1, p2, p3, p4  -- all except aux column rn
FROM   crosstab(
  'SELECT row_number() OVER (ORDER BY id, scenario, period)::int AS rn
        , id, scenario, period, period, ct
   FROM   m
   ORDER  BY 1'
, 'VALUES (1), (2), (3), (4)'
   ) AS (rn int, id int, scenario int, period int, p1 int, p2 int, p3 int, p4 int);

Two special difficulties:

  1. You don't have a single unique column for the row_name yet. I use row_number() to generate the surrogate key: rn. I removed it from the outer SELECT to match your desired result.
    The way you tried it, id is taken to be the row_name and all input rows are aggregated into a single output row.

  2. You want additional columns (scenario and period) in the result, which must come after the row_name and before the category. You must list period twice to get the original column additionally - redundant as though it may seem.

Basics:

  • PostgreSQL Crosstab Query

Related to this particular case:

  • Pivot on Multiple Columns using Tablefunc

Typically, you would have a query like this:

SELECT id, scenario, p1, p2, p3, p4  -- all except aux column rn
FROM   crosstab(
  'SELECT rank() OVER (ORDER BY id, scenario)::int AS rn
        , id, scenario, period, ct
   FROM   m
   ORDER  BY 1'
, 'VALUES (1), (2), (3), (4)'
   ) AS (rn int, id int, scenario int, p1 int, p2 int, p3 int, p4 int);

With an output like this:

id   scenario   p1   p2   p3   p4
2    1          1    1    1    1
2    2          1    1    1    1
2    3          1    1    1    1

Note the use of rank() instead of row_number() to group same combinations of (id, scenario) together.
The result makes more sense if counts are not all 1.

like image 177
Erwin Brandstetter Avatar answered Oct 20 '22 18:10

Erwin Brandstetter