Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL query to count/group by day and display days with no data

I need to create a PostgreSQL query that returns

  • a day
  • the number of objects found for that day

It's important that every single day appear in the results, even if no objects were found on that day. (This has been discussed before but I haven't been able to get things working in my specific case.)

First, I found a sql query to generate a range of days, with which I can join:

SELECT to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD') AS date  FROM generate_series(0, 365, 1)  AS offs 

Results in:

    date     ------------  2013-03-28  2013-03-27  2013-03-26  2013-03-25  ...  2012-03-28 (366 rows) 

Now I'm trying to join that to a table named 'sharer_emailshare' which has a 'created' column:

Table 'public.sharer_emailshare' column    |   type   ------------------- id        | integer created   | timestamp with time zone message   | text to        | character varying(75) 

Here's the best GROUP BY query I have so far:

SELECT d.date, count(se.id) FROM (     select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD')     AS date      FROM generate_series(0, 365, 1)      AS offs     ) d  JOIN sharer_emailshare se  ON (d.date=to_char(date_trunc('day', se.created), 'YYYY-MM-DD'))   GROUP BY d.date; 

The results:

    date    | count  ------------+-------  2013-03-27 |    11  2013-03-24 |     2  2013-02-14 |     2 (3 rows) 

Desired results:

    date    | count  ------------+-------  2013-03-28 |     0  2013-03-27 |    11  2013-03-26 |     0  2013-03-25 |     0  2013-03-24 |     2  2013-03-23 |     0  ...  2012-03-28 |     0 (366 rows) 

If I understand correctly this is because I'm using a plain (implied INNER) JOIN, and this is the expected behavior, as discussed in the postgres docs.

I've looked through dozens of StackOverflow solutions, and all the ones with working queries seem specific to MySQL/Oracle/MSSQL and I'm having a hard time translating them to PostgreSQL.

The guy asking this question found his answer, with Postgres, but put it on a pastebin link that expired some time ago.

I've tried to switch to LEFT OUTER JOIN, RIGHT JOIN, RIGHT OUTER JOIN, CROSS JOIN, use a CASE statement to sub in another value if null, COALESCE to provide a default value, etc, but I haven't been able to use them in a way that gets me what I need.

Any assistance is appreciated! And I promise I'll get around to reading that giant PostgreSQL book soon ;)

like image 398
Marcel Chastain Avatar asked Mar 28 '13 20:03

Marcel Chastain


People also ask

How do you return rows with 0 count for missing data?

count() never returns NULL - 0 for no rows - but the LEFT JOIN does. To return 0 instead of NULL in the outer SELECT , use COALESCE(some_count, 0) AS some_count .

How do I create a count query in PostgreSQL?

The PostgreSQL COUNT function counts a number of rows or non-NULL values against a specific column from a table. When an asterisk(*) is used with count function the total number of rows returns. The asterisk(*) indicates all the rows. This clause is optional.

How do I count records in PostgreSQL?

The basic SQL standard query to count the rows in a table is: SELECT count(*) FROM table_name; This can be rather slow because PostgreSQL has to check visibility for all rows, due to the MVCC model.


2 Answers

You just need a left outer join instead of an inner join:

SELECT d.date, count(se.id) FROM (SELECT to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD') AS date        FROM generate_series(0, 365, 1) AS offs      ) d LEFT OUTER JOIN      sharer_emailshare se       ON d.date = to_char(date_trunc('day', se.created), 'YYYY-MM-DD'))   GROUP BY d.date; 
like image 159
Gordon Linoff Avatar answered Sep 21 '22 09:09

Gordon Linoff


Extending Gordon Linoff's helpful answer, I would suggest a couple of improvements such as:

  • Use ::date instead of date_trunc('day', ...)
  • Join on a date type rather than a character type (it's cleaner).
  • Use specific date ranges so they're easier to change later. In this case I select a year before the most recent entry in the table - something that couldn't have been done easily with the other query.
  • Compute the totals for an arbitrary subquery (using a CTE). You just have to cast the column of interest to the date type and call it date_column.
  • Include a column for cumulative total. (Why not?)

Here's my query:

WITH dates_table AS (     SELECT created::date AS date_column FROM sharer_emailshare WHERE showroom_id=5 ) SELECT series_table.date, COUNT(dates_table.date_column), SUM(COUNT(dates_table.date_column)) OVER (ORDER BY series_table.date) FROM (     SELECT (last_date - b.offs) AS date         FROM (             SELECT GENERATE_SERIES(0, last_date - first_date, 1) AS offs, last_date from (                  SELECT MAX(date_column) AS last_date, (MAX(date_column) - '1 year'::interval)::date AS first_date FROM dates_table             ) AS a         ) AS b ) AS series_table LEFT OUTER JOIN dates_table     ON (series_table.date = dates_table.date_column) GROUP BY series_table.date ORDER BY series_table.date 

I tested the query, and it produces the same results, plus the column for cumulative total.

like image 21
Travis Avatar answered Sep 20 '22 09:09

Travis