Here's my table 'tab_test': <pre class="prettyprint"><code>year animal price 2000 kittens 79 2000 kittens 93 2000 kittens 100 2000 puppies 15 2000 puppies 32 2001 kittens 31 2001 kittens 17 2001 puppies 65 2001 puppies 48 2002 kittens 84 2002 kittens 86 2002 puppies 15 2002 puppies 95 2003 kittens 62 2003 kittens 24 2003 puppies 36 2003 puppies 41 2004 kittens 65 2004 kittens 85 2004 puppies 58 2004 puppies 95 2005 kittens 45 2005 kittens 25 2005 puppies 15 2005 puppies 35 2006 kittens 50 2006 kittens 80 2006 puppies 95 2006 puppies 49 2007 kittens 40 2007 kittens 19 2007 puppies 81 2007 puppies 38 2008 kittens 37 2008 kittens 51 2008 puppies 29 2008 puppies 72 2009 kittens 84 2009 kittens 26 2009 puppies 49 2009 puppies 34 2010 kittens 75 2010 kittens 96 2010 puppies 18 2010 puppies 26 2011 kittens 35 2011 kittens 21 2011 puppies 90 2011 puppies 18 2012 kittens 12 2012 kittens 23 2012 puppies 74 2012 puppies 79 </code></pre> Here's some code that transposes the rows and columns so I get an average for 'kittens' and 'puppies': <pre class="prettyprint"><code>SELECT year, AVG(CASE WHEN animal = 'kittens' THEN price END) AS "kittens", AVG(CASE WHEN animal = 'puppies' THEN price END) AS "puppies" FROM tab_test GROUP BY year ORDER BY year; </code></pre> The output for the code above is: <pre class="prettyprint"><code> year kittens puppies 2000 90.6666666666667 23.5 2001 24.0 56.5 2002 85.0 55.0 2003 43.0 38.5 2004 75.0 76.5 2005 35.0 25.0 2006 65.0 72.0 2007 29.5 59.5 2008 44.0 50.5 2009 55.0 41.5 2010 85.5 22.0 2011 28.0 54.0 2012 17.5 76.5 </code></pre> What I'd like is a table like the second one, but it would only contain items which had a <code>COUNT()</code> of at least 3 in the first table. In other words, the goal is to have this as output: <pre class="prettyprint"><code>year kittens 2000 90.6666666666667 </code></pre> There were at least 3 instances of 'kitten' in the first table. Is this possible in PostgreSQL?

<h3><code>CASE</code></h3> If your case is as simple as demonstrated, a <code>CASE</code> statement will do: <pre class="prettyprint"><code>SELECT year , sum(CASE WHEN animal = 'kittens' THEN price END) AS kittens , sum(CASE WHEN animal = 'puppies' THEN price END) AS puppies FROM ( SELECT year, animal, avg(price) AS price FROM tab_test GROUP BY year, animal HAVING count(*) > 2 ) t GROUP BY year ORDER BY year; </code></pre> Doesn't matter whether you use <code>sum()</code>, <code>max()</code> or <code>min()</code> as aggregate function in the outer query. They all result in the same value in this case. SQL Fiddle <h3><code>crosstab()</code></h3> With more categories it will be simpler with a <code>crosstab()</code> query. This should also be faster for bigger tables. You need to install the additional module tablefunc (once per database). Since Postgres 9.1 that's as simple as: <pre class="prettyprint"><code>CREATE EXTENSION tablefunc; </code></pre> Details in this related answer: <ul> <li>PostgreSQL Crosstab Query</li> </ul> <pre class="prettyprint"><code>SELECT * FROM crosstab( 'SELECT year, animal, avg(price) AS price FROM tab_test GROUP BY animal, year HAVING count(*) > 2 ORDER BY 1,2' ,$$VALUES ('kittens'::text), ('puppies')$$) AS ct ("year" text, "kittens" numeric, "puppies" numeric); </code></pre> No sqlfiddle for this one because the site doesn't allow additional modules. <h3>Benchmark</h3> To verify my claims I ran a quick benchmark with close to real data in my small test database. PostgreSQL 9.1.6. Test with <code>EXPLAIN ANALYZE</code>, best of 10: Test setup with 10020 rows: <pre class="prettyprint"><code>CREATE TABLE tab_test (year int, animal text, price numeric); -- years with lots of rows INSERT INTO tab_test SELECT 2000 + ((g + random() * 300))::int/1000 , CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END , (random() * 200)::numeric FROM generate_series(1,10000) g; -- .. and some years with only few rows to include cases with count < 3 INSERT INTO tab_test SELECT 2010 + ((g + random() * 10))::int/2 , CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END , (random() * 200)::numeric FROM generate_series(1,20) g; </code></pre> Results: @bluefeet Total runtime: 95.401 ms @wildplasser (different results, includes rows with <code>count <= 3</code>) Total runtime: 64.497 ms @Andreiy (+ <code>ORDER BY</code>) & @Erwin1 - <code>CASE</code> (both perform about the same) Total runtime: 39.105 ms @Erwin2 - <code>crosstab()</code> Total runtime: 17.644 ms Largely proportional (but irrelevant) results with only 20 rows. Only @wildplasser's CTE has more overhead and spikes a little. With more than a handful of rows, <code>crosstab()</code> quickly takes lead. @Andreiy's query performs about the same as my simplified version, aggregate function in outer <code>SELECT</code> (<code>min()</code>, <code>max()</code>, <code>sum()</code>) makes no measurable difference (just two rows per group). Everything as expected, no surprises, take my setup and try it @home.

Transpose rows and columns (a.k.a. pivot) only with a minimum COUNT()?

Tags:

sql

postgresql

pivot

crosstab

Here's my table 'tab_test':

year    animal  price
2000    kittens 79
2000    kittens 93
2000    kittens 100
2000    puppies 15
2000    puppies 32
2001    kittens 31
2001    kittens 17
2001    puppies 65
2001    puppies 48
2002    kittens 84
2002    kittens 86
2002    puppies 15
2002    puppies 95
2003    kittens 62
2003    kittens 24
2003    puppies 36
2003    puppies 41
2004    kittens 65
2004    kittens 85
2004    puppies 58
2004    puppies 95
2005    kittens 45
2005    kittens 25
2005    puppies 15
2005    puppies 35
2006    kittens 50
2006    kittens 80
2006    puppies 95
2006    puppies 49
2007    kittens 40
2007    kittens 19
2007    puppies 81
2007    puppies 38
2008    kittens 37
2008    kittens 51
2008    puppies 29
2008    puppies 72
2009    kittens 84
2009    kittens 26
2009    puppies 49
2009    puppies 34
2010    kittens 75
2010    kittens 96
2010    puppies 18
2010    puppies 26
2011    kittens 35
2011    kittens 21
2011    puppies 90
2011    puppies 18
2012    kittens 12
2012    kittens 23
2012    puppies 74
2012    puppies 79

Here's some code that transposes the rows and columns so I get an average for 'kittens' and 'puppies':

SELECT
    year,
    AVG(CASE WHEN animal = 'kittens' THEN price END) AS "kittens",
    AVG(CASE WHEN animal = 'puppies' THEN price END) AS "puppies"
FROM tab_test
GROUP BY year
ORDER BY year;

The output for the code above is:

    year    kittens puppies
    2000    90.6666666666667    23.5
    2001    24.0    56.5
    2002    85.0    55.0
    2003    43.0    38.5
    2004    75.0    76.5
    2005    35.0    25.0
    2006    65.0    72.0
    2007    29.5    59.5
    2008    44.0    50.5
    2009    55.0    41.5
    2010    85.5    22.0
    2011    28.0    54.0
    2012    17.5    76.5

What I'd like is a table like the second one, but it would only contain items which had a COUNT() of at least 3 in the first table. In other words, the goal is to have this as output:

year    kittens
2000    90.6666666666667

There were at least 3 instances of 'kitten' in the first table.
Is this possible in PostgreSQL?

853

asked Oct 31 '12 21:10

user1626730

2 Answers

`CASE`

If your case is as simple as demonstrated, a CASE statement will do:

SELECT year
     , sum(CASE WHEN animal = 'kittens' THEN price END) AS kittens
     , sum(CASE WHEN animal = 'puppies' THEN price END) AS puppies
FROM  (
   SELECT year, animal, avg(price) AS price
   FROM   tab_test
   GROUP  BY year, animal
   HAVING count(*) > 2
   ) t
GROUP  BY year
ORDER  BY year;

Doesn't matter whether you use sum(), max() or min() as aggregate function in the outer query. They all result in the same value in this case.

SQL Fiddle

`crosstab()`

With more categories it will be simpler with a crosstab() query. This should also be faster for bigger tables.

You need to install the additional module tablefunc (once per database). Since Postgres 9.1 that's as simple as:

CREATE EXTENSION tablefunc;

Details in this related answer:

PostgreSQL Crosstab Query

SELECT * FROM crosstab(
      'SELECT year, animal, avg(price) AS price
       FROM   tab_test
       GROUP  BY animal, year
       HAVING count(*) > 2
       ORDER  BY 1,2'

      ,$$VALUES ('kittens'::text), ('puppies')$$)
AS ct ("year" text, "kittens" numeric, "puppies" numeric);

No sqlfiddle for this one because the site doesn't allow additional modules.

Benchmark

To verify my claims I ran a quick benchmark with close to real data in my small test database. PostgreSQL 9.1.6. Test with EXPLAIN ANALYZE, best of 10:

Test setup with 10020 rows:

CREATE TABLE tab_test (year int, animal text, price numeric);

-- years with lots of rows
INSERT INTO tab_test
SELECT 2000 + ((g + random() * 300))::int/1000 
     , CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END
     , (random() * 200)::numeric
FROM   generate_series(1,10000) g;

-- .. and some years with only few rows to include cases with count < 3
INSERT INTO tab_test
SELECT 2010 + ((g + random() * 10))::int/2
     , CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END
     , (random() * 200)::numeric
FROM   generate_series(1,20) g;

Results:

@bluefeet
Total runtime: 95.401 ms

@wildplasser (different results, includes rows with count <= 3)
Total runtime: 64.497 ms

@Andreiy (+ ORDER BY)
& @Erwin1 - CASE (both perform about the same)
Total runtime: 39.105 ms

@Erwin2 - crosstab()
Total runtime: 17.644 ms

Largely proportional (but irrelevant) results with only 20 rows. Only @wildplasser's CTE has more overhead and spikes a little.

With more than a handful of rows, crosstab() quickly takes lead. @Andreiy's query performs about the same as my simplified version, aggregate function in outer SELECT (min(), max(), sum()) makes no measurable difference (just two rows per group).

Everything as expected, no surprises, take my setup and try it @home.

121

answered Nov 15 '22 22:11

Erwin Brandstetter

Here's an alternative to @bluefeet's suggestion, which is somewhat similar but avoids the join (instead, the upper level grouping is applied to the already grouped result set):

SELECT
  year,
  MAX(CASE animal WHEN 'kittens' THEN avg_price END) AS "kittens",
  MAX(CASE animal WHEN 'puppies' THEN avg_price END) AS "puppies"
FROM (
  SELECT
    animal,
    year,
    COUNT(*) AS cnt,
    AVG(Price) AS avg_price
  FROM tab_test
  GROUP BY
    animal,
    year
) s
WHERE cnt >= 3
GROUP BY
  year
;

answered Nov 15 '22 21:11

Andriy M

Related questions
                            
                                ORDER BY on different columns in different directions in SQLite
                            
                                Find a Database table's unique constraint
                            
                                List of all tables with a relationship to a given table or view
                            
                                Is adding a bit mask to all tables in a database useful?
                            
                                Simple SQL Lite table/import question
                            
                                Reordering an ordered list
                            
                                how to best organize the Inner Joins in (select) statement
                            
                                SELECT min and max value from a part of a table in MySQL
                            
                                Oracle: Indexing a subset of rows of a table
                            
                                how to select last 12 months name and year without using tables using sql query?
                            
                                Initialising a pl/sql record type
                            
                                Concat two table columns and update one with result
                            
                                Is this a 1NF failure?
                            
                                how find "holes" in auto_increment column?
                            
                                LIMIT offset or OFFSET in an UPDATE SQL query
                            
                                How to properly add brackets to SQL queries with 'or' and 'and' clauses by using Arel?
                            
                                Postgresql select between month range
                            
                                Store a PHP array in a single SQL cell
                            
                                SQL select elements where sum of field is less than N
                            
                                how to make sure a record is always at the top in a given resultset in mysql?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With