Postgres - how to return rows with 0 count for missing data?

Tags:

I have unevenly distributed data(wrt date) for a few years (2003-2008). I want to query data for a given set of start and end date, grouping the data by any of the supported intervals (day, week, month, quarter, year) in PostgreSQL 8.3 (http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC).

The problem is that some of the queries give results continuous over the required period, as this one:

select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) 
from some_table where category_id=1 and entity_id = 77  and entity2_id = 115 
and date <= '2008-12-06' and date >= '2007-12-01' group by 
date_trunc('month',date) order by date_trunc('month',date);
          to_char   | count 
        ------------+-------
         2007-12-01 |    64
         2008-01-01 |    31
         2008-02-01 |    14
         2008-03-01 |    21
         2008-04-01 |    28
         2008-05-01 |    44
         2008-06-01 |   100
         2008-07-01 |    72
         2008-08-01 |    91
         2008-09-01 |    92
         2008-10-01 |    79
         2008-11-01 |    65
        (12 rows)

but some of them miss some intervals because there is no data present, as this one:

select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id) 
from some_table where category_id=1 and entity_id = 75  and entity2_id = 115 
and date <= '2008-12-06' and date >= '2007-12-01' group by 
date_trunc('month',date) order by date_trunc('month',date);

        to_char   | count 
    ------------+-------

     2007-12-01 |     2
     2008-01-01 |     2
     2008-03-01 |     1
     2008-04-01 |     2
     2008-06-01 |     1
     2008-08-01 |     3
     2008-10-01 |     2
    (7 rows)

where the required resultset is:

  to_char   | count 
------------+-------
 2007-12-01 |     2
 2008-01-01 |     2
 2008-02-01 |     0
 2008-03-01 |     1
 2008-04-01 |     2
 2008-05-01 |     0
 2008-06-01 |     1
 2008-07-01 |     0
 2008-08-01 |     3
 2008-09-01 |     0
 2008-10-01 |     2
 2008-11-01 |     0
(12 rows)

A count of 0 for missing entries.

I have seen earlier discussions on Stack Overflow but they don't solve my problem it seems, since my grouping period is one of (day, week, month, quarter, year) and decided on runtime by the application. So an approach like left join with a calendar table or sequence table will not help I guess.

My current solution to this is to fill in these gaps in Python (in a Turbogears App) using the calendar module.

Is there a better way to do this.

272

asked Dec 06 '08 09:12

JV.

1 Answers

^{This question is old. But since fellow users picked it as master for a new duplicate I am adding a proper answer.}

Proper solution

SELECT *
FROM  (
   SELECT day::date
   FROM   generate_series(timestamp '2007-12-01'
                        , timestamp '2008-12-01'
                        , interval  '1 month') day
   ) d
LEFT   JOIN (
   SELECT date_trunc('month', date_col)::date AS day
        , count(*) AS some_count
   FROM   tbl
   WHERE  date_col >= date '2007-12-01'
   AND    date_col <= date '2008-12-06'
-- AND    ... more conditions
   GROUP  BY 1
   ) t USING (day)
ORDER  BY day;

Use LEFT JOIN, of course.
generate_series() can produce a table of timestamps on the fly, and very fast.
It's generally faster to aggregate before you join. I recently provided a test case on sqlfiddle.com in this related answer:
- PostgreSQL - order by an array
Cast the timestamp to date (::date) for a basic format. For more use to_char().
GROUP BY 1 is syntax shorthand to reference the first output column. Could be GROUP BY day as well, but that might conflict with an existing column of the same name. Or GROUP BY date_trunc('month', date_col)::date but that's too long for my taste.
Works with the available interval arguments for date_trunc().
count() never produces NULL (0 for no rows), but the LEFT JOIN does.
To return 0 instead of NULL in the outer SELECT, use COALESCE(some_count, 0) AS some_count. The manual.
For a more generic solution or arbitrary time intervals consider this closely related answer:
- Best way to count records by arbitrary time intervals in Rails+Postgres

answered Sep 20 '22 16:09

Erwin Brandstetter

Related questions
                            
                                Regex to find an integer within a string
                            
                                How can I create a count down timer for cocos2d?
                            
                                Read echo'ed output from another PHP file
                            
                                Optimize development Virtual Machine
                            
                                Multiple iPhone Developer Accounts on One Mac?
                            
                                Is it possible to compile Linux kernel with something other than gcc
                            
                                Finding Nth item of unsorted list without sorting the list
                            
                                jQuery add HTML table column
                            
                                How do I force VBA/Access to require variables to be defined?
                            
                                SQL Server Update Trigger, Get Only modified fields
                            
                                Get size of POST-request in PHP
                            
                                Java isEmpty() undefined for String?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With