Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum by month and put months as columns

Background

I have time series data on a monthly basis and I would like to sum values for each ID, grouped by month and then have the month names as columns rather than as rows.

Example

+----+------------+-------+-------+
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 10    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 1  | abc        | jan   | 15    |
| 3  | ghi        | mar   | 15    |

Desired Result

+----+------------+-----+-----+-----+
| id | extra_info | jan | feb | mar |
+----+------------+-----+-----+-----+
| 1  | abc        | 25  | 20  | 0   |
| 2  | def        | 10  | 5   | 0   |
| 3  | ghi        | 0   | 0   | 15  |

Current Approach

I can easily group by month, summing the values. Which gets me to:

-----------------------------------
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 25    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 3  | ghi        | mar   | 15    |

But I now need those months as column names. Not sure where to go from here.

Additional Information

  • In terms of language, this query is to be run in postgres.
  • The months above are just examples, obviously the real data set is much larger and covers all 12 months across thousands of IDs

Any ideas from an SQL guru very much appreciated!

like image 818
Pete Hamilton Avatar asked May 16 '13 21:05

Pete Hamilton


2 Answers

You can use an aggregate function with a CASE expression to turn the rows into columns:

select id,
  extra_info,
  sum(case when month = 'jan' then value else 0 end) jan,
  sum(case when month = 'feb' then value else 0 end) feb,
  sum(case when month = 'mar' then value else 0 end) mar,
  sum(case when month = 'apr' then value else 0 end) apr,
  sum(case when month = 'may' then value else 0 end) may,
  sum(case when month = 'jun' then value else 0 end) jun,
  sum(case when month = 'jul' then value else 0 end) jul,
  sum(case when month = 'aug' then value else 0 end) aug,
  sum(case when month = 'sep' then value else 0 end) sep,
  sum(case when month = 'oct' then value else 0 end) oct,
  sum(case when month = 'nov' then value else 0 end) nov,
  sum(case when month = 'dec' then value else 0 end) "dec"
from yt
group by id, extra_info

See SQL Fiddle with Demo

like image 195
Taryn Avatar answered Oct 17 '22 00:10

Taryn


Setup

CREATE TABLE tbl (
  id int
, extra_info varchar(3)
, month date
, value int
);
   
INSERT INTO tbl VALUES
  (1, 'abc', '2012-01-01', 10)
, (1, 'abc', '2012-02-01', 20)
, (2, 'def', '2012-01-01', 10)
, (2, 'def', '2012-02-01',  5)
, (1, 'abc', '2012-01-01', 15)
, (3, 'ghi', '2012-03-01', 15)
;

crosstab()

I would use crosstab() from the additional tablefunc module. Install once per database with:

CREATE EXTENSION tablefunc;

Basics:

  • PostgreSQL Crosstab Query

How to deal with "extra" columns:

  • Pivot on Multiple Columns using Tablefunc

Advanced usage:

  • Dynamic alternative to pivot with CASE and GROUP BY

Query

SELECT * FROM crosstab(
   $$
   SELECT id, min(extra_info), month, sum(value) AS value
   FROM   tbl
   GROUP  BY id, month
   ORDER  BY id, month
   $$
 , $$
   VALUES
     ('jan'::text), ('feb'), ('mar'), ('apr'), ('may'), ('jun')
   , ('jul'),       ('aug'), ('sep'), ('oct'), ('nov'), ('dec')
   $$
   ) AS ct (id  int, extra text
          , jan int, feb int, mar int, apr int, may int, jun int
          , jul int, aug int, sep int, oct int, nov int, dec int);

Obviously, you can only output one extra_info per id. I pick min(extra_info) since you didn't specify. If all are the same per id, you could also group by it additionally.

Result:

 id | extra | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
  1 | abc   |  25 |  20 |     |     |     |     |     |     |     |     |     |
  2 | def   |  10 |   5 |     |     |     |     |     |     |     |     |     |
  3 | ghi   |     |     |  15 |     |     |     |     |     |     |     |     |

db<>fiddle here

Installing the tablefunc module (once per database) incurs some overhead, but queries are typically faster and shorter.

Pure SQL

If you can't or won't install the additional module, plain SQL got s bit faster with the aggregate FILTER clause added with Postgres 9.4. See:

  • Aggregate columns with additional (distinct) filters
SELECT id, min(extra_info) AS extra
     , sum(value) FILTER (WHERE month = 'jan') AS jan
     , sum(value) FILTER (WHERE month = 'feb') AS feb
     , sum(value) FILTER (WHERE month = 'mar') AS mar
     , sum(value) FILTER (WHERE month = 'apr') AS apr
     , sum(value) FILTER (WHERE month = 'may') AS may
     , sum(value) FILTER (WHERE month = 'jun') AS jun
     , sum(value) FILTER (WHERE month = 'jul') AS jul
     , sum(value) FILTER (WHERE month = 'aug') AS aug
     , sum(value) FILTER (WHERE month = 'sep') AS sep
     , sum(value) FILTER (WHERE month = 'oct') AS oct
     , sum(value) FILTER (WHERE month = 'nov') AS nov
     , sum(value) FILTER (WHERE month = 'dec') AS dec
FROM   tbl
GROUP  BY id
ORDER  BY id;

0 instead of NULL

To output 0 instead of NULL for missing values, use COALESCE for either query:

SELECT id, extra
     , COALESCE(jan, 0) AS jan
     , COALESCE(feb, 0) AS feb
     , COALESCE(mar, 0) AS mar
     , COALESCE(apr, 0) AS apr
     , COALESCE(may, 0) AS may
     , COALESCE(jun, 0) AS jun
     , COALESCE(jul, 0) AS jul
     , COALESCE(aug, 0) AS aug
     , COALESCE(sep, 0) AS sep
     , COALESCE(oct, 0) AS oct
     , COALESCE(nov, 0) AS nov
     , COALESCE(dec, 0) AS dec
FROM  (<query from above>)

db<>fiddle here

like image 21
Erwin Brandstetter Avatar answered Oct 16 '22 23:10

Erwin Brandstetter