Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL Crosstab Query

People also ask

What is crosstab query in PostgreSQL?

The crosstab function produces one output row for each consecutive group of input rows with the same row_name value. The output row_name column, plus any “extra” columns, are copied from the first row of the group. The output value columns are filled with the value fields from rows having matching category values.

Does Postgres have pivot?

SQL Server provides PIVOT and UNPIVOT functions to create pivot tables. Unfortunately, PostgreSQL does implement them.


Install the additional module tablefunc once per database, which provides the function crosstab(). Since Postgres 9.1 you can use CREATE EXTENSION for that:

CREATE EXTENSION IF NOT EXISTS tablefunc;

Improved test case

CREATE TABLE tbl (
   section   text
 , status    text
 , ct        integer  -- "count" is a reserved word in standard SQL
);

INSERT INTO tbl VALUES 
  ('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
                    , ('C', 'Inactive', 7);  -- ('C', 'Active') is missing

Simple form - not fit for missing attributes

crosstab(text) with 1 input parameter:

SELECT *
FROM   crosstab(
   'SELECT section, status, ct
    FROM   tbl
    ORDER  BY 1,2'  -- needs to be "ORDER BY 1,2" here
   ) AS ct ("Section" text, "Active" int, "Inactive" int);

Returns:

 Section | Active | Inactive
---------+--------+----------
 A       |      1 |        2
 B       |      4 |        5
 C       |      7 |           -- !!
  • No need for casting and renaming.
  • Note the incorrect result for C: the value 7 is filled in for the first column. Sometimes, this behavior is desirable, but not for this use case.
  • The simple form is also limited to exactly three columns in the provided input query: row_name, category, value. There is no room for extra columns like in the 2-parameter alternative below.

Safe form

crosstab(text, text) with 2 input parameters:

SELECT *
FROM   crosstab(
   'SELECT section, status, ct
    FROM   tbl
    ORDER  BY 1,2'  -- could also just be "ORDER BY 1" here

  , $$VALUES ('Active'::text), ('Inactive')$$
   ) AS ct ("Section" text, "Active" int, "Inactive" int);

Returns:

 Section | Active | Inactive
---------+--------+----------
 A       |      1 |        2
 B       |      4 |        5
 C       |        |        7  -- !!
  • Note the correct result for C.

  • The second parameter can be any query that returns one row per attribute matching the order of the column definition at the end. Often you will want to query distinct attributes from the underlying table like this:

      'SELECT DISTINCT attribute FROM tbl ORDER BY 1'
    

That's in the manual.

Since you have to spell out all columns in a column definition list anyway (except for pre-defined crosstabN() variants), it is typically more efficient to provide a short list in a VALUES expression like demonstrated:

    $$VALUES ('Active'::text), ('Inactive')$$)

Or (not in the manual):

    $$SELECT unnest('{Active,Inactive}'::text[])$$  -- short syntax for long lists
  • I used dollar quoting to make quoting easier.

  • You can even output columns with different data types with crosstab(text, text) - as long as the text representation of the value column is valid input for the target type. This way you might have attributes of different kind and output text, date, numeric etc. for respective attributes. There is a code example at the end of the chapter crosstab(text, text) in the manual.

db<>fiddle here

Effect of excess input rows

Excess input rows are handled differently - duplicate rows for the same ("row_name", "category") combination - (section, status) in the above example.

The 1-parameter form fills in available value columns from left to right. Excess values are discarded.
Earlier input rows win.

The 2-parameter form assigns each input value to its dedicated column, overwriting any previous assignment.
Later input rows win.

Typically, you don't have duplicates to begin with. But if you do, carefully adjust the sort order to your requirements - and document what's happening.
Or get fast arbitrary results if you don't care. Just be aware of the effect.

Advanced examples

  • Pivot on Multiple Columns using Tablefunc - also demonstrating mentioned "extra columns"

  • Dynamic alternative to pivot with CASE and GROUP BY


\crosstabview in psql

Postgres 9.6 added this meta-command to its default interactive terminal psql. You can run the query you would use as first crosstab() parameter and feed it to \crosstabview (immediately or in the next step). Like:

db=> SELECT section, status, ct FROM tbl \crosstabview

Similar result as above, but it's a representation feature on the client side exclusively. Input rows are treated slightly differently, hence ORDER BY is not required. Details for \crosstabview in the manual. There are more code examples at the bottom of that page.

Related answer on dba.SE by Daniel Vérité (the author of the psql feature):

  • How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?

SELECT section,
       SUM(CASE status WHEN 'Active' THEN count ELSE 0 END) AS active, --here you pivot each status value as a separate column explicitly
       SUM(CASE status WHEN 'Inactive' THEN count ELSE 0 END) AS inactive --here you pivot each status  value as a separate column explicitly

FROM t
GROUP BY section

You can use the crosstab() function of the additional module tablefunc - which you have to install once per database. Since PostgreSQL 9.1 you can use CREATE EXTENSION for that:

CREATE EXTENSION tablefunc;

In your case, I believe it would look something like this:

CREATE TABLE t (Section CHAR(1), Status VARCHAR(10), Count integer);

INSERT INTO t VALUES ('A', 'Active',   1);
INSERT INTO t VALUES ('A', 'Inactive', 2);
INSERT INTO t VALUES ('B', 'Active',   4);
INSERT INTO t VALUES ('B', 'Inactive', 5);

SELECT row_name AS Section,
       category_1::integer AS Active,
       category_2::integer AS Inactive
FROM crosstab('select section::text, status, count::text from t',2)
            AS ct (row_name text, category_1 text, category_2 text);

Solution with JSON aggregation:

CREATE TEMP TABLE t (
  section   text
, status    text
, ct        integer  -- don't use "count" as column name.
);

INSERT INTO t VALUES 
  ('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
                   , ('C', 'Inactive', 7); 


SELECT section,
       (obj ->> 'Active')::int AS active,
       (obj ->> 'Inactive')::int AS inactive
FROM (SELECT section, json_object_agg(status,ct) AS obj
      FROM t
      GROUP BY section
     )X

Sorry this isn't complete because I can't test it here, but it may get you off in the right direction. I'm translating from something I use that makes a similar query:

select mt.section, mt1.count as Active, mt2.count as Inactive
from mytable mt
left join (select section, count from mytable where status='Active')mt1
on mt.section = mt1.section
left join (select section, count from mytable where status='Inactive')mt2
on mt.section = mt2.section
group by mt.section,
         mt1.count,
         mt2.count
order by mt.section asc;

The code I'm working from is:

select m.typeID, m1.highBid, m2.lowAsk, m1.highBid - m2.lowAsk as diff, 100*(m1.highBid - m2.lowAsk)/m2.lowAsk as diffPercent
from mktTrades m
   left join (select typeID,MAX(price) as highBid from mktTrades where bid=1 group by typeID)m1
   on m.typeID = m1.typeID
   left join (select typeID,MIN(price) as lowAsk  from mktTrades where bid=0 group by typeID)m2
   on m1.typeID = m2.typeID
group by m.typeID, 
         m1.highBid, 
         m2.lowAsk
order by diffPercent desc;

which will return a typeID, the highest price bid and the lowest price asked and the difference between the two (a positive difference would mean something could be bought for less than it can be sold).