Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres - how to return rows with 0 count for missing data?

Tags:

I have unevenly distributed data(wrt date) for a few years (2003-2008). I want to query data for a given set of start and end date, grouping the data by any of the supported intervals (day, week, month, quarter, year) in PostgreSQL 8.3 (http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC).

The problem is that some of the queries give results continuous over the required period, as this one:

select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) 
from some_table where category_id=1 and entity_id = 77  and entity2_id = 115 
and date <= '2008-12-06' and date >= '2007-12-01' group by 
date_trunc('month',date) order by date_trunc('month',date);
          to_char   | count 
        ------------+-------
         2007-12-01 |    64
         2008-01-01 |    31
         2008-02-01 |    14
         2008-03-01 |    21
         2008-04-01 |    28
         2008-05-01 |    44
         2008-06-01 |   100
         2008-07-01 |    72
         2008-08-01 |    91
         2008-09-01 |    92
         2008-10-01 |    79
         2008-11-01 |    65
        (12 rows)

but some of them miss some intervals because there is no data present, as this one:

select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) 
from some_table where category_id=1 and entity_id = 75  and entity2_id = 115 
and date <= '2008-12-06' and date >= '2007-12-01' group by 
date_trunc('month',date) order by date_trunc('month',date);

        to_char   | count 
    ------------+-------

     2007-12-01 |     2
     2008-01-01 |     2
     2008-03-01 |     1
     2008-04-01 |     2
     2008-06-01 |     1
     2008-08-01 |     3
     2008-10-01 |     2
    (7 rows)

where the required resultset is:

  to_char   | count 
------------+-------
 2007-12-01 |     2
 2008-01-01 |     2
 2008-02-01 |     0
 2008-03-01 |     1
 2008-04-01 |     2
 2008-05-01 |     0
 2008-06-01 |     1
 2008-07-01 |     0
 2008-08-01 |     3
 2008-09-01 |     0
 2008-10-01 |     2
 2008-11-01 |     0
(12 rows)

A count of 0 for missing entries.

I have seen earlier discussions on Stack Overflow but they don't solve my problem it seems, since my grouping period is one of (day, week, month, quarter, year) and decided on runtime by the application. So an approach like left join with a calendar table or sequence table will not help I guess.

My current solution to this is to fill in these gaps in Python (in a Turbogears App) using the calendar module.

Is there a better way to do this.

like image 272
JV. Avatar asked Dec 06 '08 09:12

JV.


People also ask

Does Postgres count NULL?

Similar to the COUNT(*) function, the COUNT(column) function returns the number of rows returned by a SELECT clause. However, it does not consider NULL values in the column .

How do I count rows in PostgreSQL?

The basic SQL standard query to count the rows in a table is: SELECT count(*) FROM table_name; This can be rather slow because PostgreSQL has to check visibility for all rows, due to the MVCC model.

What is Setof in PostgreSQL?

Alternatively, an SQL function can be declared to return a set (that is, multiple rows) by specifying the function's return type as SETOF sometype , or equivalently by declaring it as RETURNS TABLE( columns ) . In this case all rows of the last query's result are returned. Further details appear below.

How does count work in PostgreSQL?

The PostgreSQL COUNT function counts a number of rows or non-NULL values against a specific column from a table. When an asterisk(*) is used with count function the total number of rows returns. The asterisk(*) indicates all the rows. This clause is optional.


1 Answers

This question is old. But since fellow users picked it as master for a new duplicate I am adding a proper answer.

Proper solution

SELECT *
FROM  (
   SELECT day::date
   FROM   generate_series(timestamp '2007-12-01'
                        , timestamp '2008-12-01'
                        , interval  '1 month') day
   ) d
LEFT   JOIN (
   SELECT date_trunc('month', date_col)::date AS day
        , count(*) AS some_count
   FROM   tbl
   WHERE  date_col >= date '2007-12-01'
   AND    date_col <= date '2008-12-06'
-- AND    ... more conditions
   GROUP  BY 1
   ) t USING (day)
ORDER  BY day;
  • Use LEFT JOIN, of course.

  • generate_series() can produce a table of timestamps on the fly, and very fast.

  • It's generally faster to aggregate before you join. I recently provided a test case on sqlfiddle.com in this related answer:

    • PostgreSQL - order by an array
  • Cast the timestamp to date (::date) for a basic format. For more use to_char().

  • GROUP BY 1 is syntax shorthand to reference the first output column. Could be GROUP BY day as well, but that might conflict with an existing column of the same name. Or GROUP BY date_trunc('month', date_col)::date but that's too long for my taste.

  • Works with the available interval arguments for date_trunc().

  • count() never produces NULL (0 for no rows), but the LEFT JOIN does.
    To return 0 instead of NULL in the outer SELECT, use COALESCE(some_count, 0) AS some_count. The manual.

  • For a more generic solution or arbitrary time intervals consider this closely related answer:

    • Best way to count records by arbitrary time intervals in Rails+Postgres
like image 76
Erwin Brandstetter Avatar answered Sep 20 '22 16:09

Erwin Brandstetter