PostgreSQL: running count of rows for a query 'by minute'

Tags:

I need to query for each minute the total count of rows up to that minute.

The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:

SELECT COUNT(id) AS count      , EXTRACT(hour from "when") AS hour      , EXTRACT(minute from "when") AS minute   FROM mytable  GROUP BY hour, minute

253

asked Nov 19 '11 11:11

GabiMe

1 Answers

Return only minutes with activity

Shortest

SELECT DISTINCT        date_trunc('minute', "when") AS minute      , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct FROM   mytable ORDER  BY 1;

Use date_trunc(), it returns exactly what you need.

Don't include id in the query, since you want to GROUP BY minute slices.

count() is typically used as plain aggregate function. Appending an OVER clause makes it a window function. Omit PARTITION BY in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY. The manual:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer.

And that happens to be exactly what you need.

Use count(*) rather than count(id). It better fits your question ("count of rows"). It is generally slightly faster than count(id). And, while we might assume that id is NOT NULL, it has not been specified in the question, so count(id) is wrong, strictly speaking, because NULL values are not counted with count(id).

You can't GROUP BY minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*) would only see 1 row per minute this way.
You can, however, SELECT DISTINCT, because DISTINCT is applied after window functions.

ORDER BY 1 is just shorthand for ORDER BY date_trunc('minute', "when") here.
1 is a positional reference reference to the 1st expression in the SELECT list.

Use to_char() if you need to format the result. Like:

SELECT DISTINCT        to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute      , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct FROM   mytable ORDER  BY date_trunc('minute', "when");

Fastest

SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct FROM  (    SELECT date_trunc('minute', "when") AS minute         , count(*) AS minute_ct    FROM   tbl    GROUP  BY 1    ) sub ORDER  BY 1;

Much like the above, but:

I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT in the outer SELECT.

Use sum() as window aggregate function now to add up the counts from the subquery.

I found this to be substantially faster with many rows per minute.

Include minutes without activity

Shortest

@GabiMe asked in a comment how to get eone row for every minute in the time frame, including those where no event occured (no row in base table):

SELECT DISTINCT        minute, count(c.minute) OVER (ORDER BY minute) AS running_ct FROM  (    SELECT generate_series(date_trunc('minute', min("when"))                         ,                      max("when")                         , interval '1 min')    FROM   tbl    ) m(minute) LEFT   JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute) ORDER  BY 1;

Generate a row for every minute in the time frame between the first and the last event with generate_series() - here directly based on aggregated values from the subquery.

LEFT JOIN to all timestamps truncated to the minute and count. NULL values (where no row exists) do not add to the running count.

Fastest

With CTE:

WITH cte AS (    SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct    FROM   tbl    GROUP  BY 1    )  SELECT m.minute      , COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct FROM  (    SELECT generate_series(min(minute), max(minute), interval '1 min')    FROM   cte    ) m(minute) LEFT   JOIN cte USING (minute) ORDER  BY 1;

Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT.

Different from count(), sum() can return NULL. Default to 0 with COALESCE.

With many rows and an index on "when" this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:

SELECT m.minute      , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct FROM  (    SELECT generate_series(date_trunc('minute', min("when"))                         ,                      max("when")                         , interval '1 min')    FROM   tbl    ) m(minute) LEFT   JOIN (    SELECT date_trunc('minute', "when") AS minute         , count(*) AS minute_ct    FROM   tbl    GROUP  BY 1    ) c USING (minute) ORDER  BY 1;

answered Sep 28 '22 18:09

Erwin Brandstetter

Related questions
                            
                                VS2012 Post-Deployment script referring to several other scripts
                            
                                SQL Inner Join On Null Values
                            
                                Add date to SQL database backup filename
                            
                                List of non-empty tables in MySQL database
                            
                                MAX vs Top 1 - which is better?
                            
                                Python, SQLAlchemy pass parameters in connection.execute
                            
                                SQL Server - use columns from the main query in the subquery
                            
                                What is the SQL operator name for "<>"?
                            
                                SELECT COUNT in LINQ to SQL C#
                            
                                SQL count(*) performance
                            
                                CTE within a CTE
                            
                                Order items in MySQL by a fixed list?
                            
                                Alter data type of a column to serial
                            
                                Query with LEFT JOIN not returning rows for count of 0
                            
                                How to identify whether the table has identity column
                            
                                Script to save varbinary data to disk
                            
                                Check if a SQL table exists
                            
                                What is best way to get last indexof character in SQL 2008 [duplicate]
                            
                                How can I run updates in batches in Rails 3/4?
                            
                                How do I compare two columns for equality in SQL Server?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL: running count of rows for a query 'by minute'

Tags:

sql

datetime

postgresql

aggregate-functions

window-functions

GabiMe

People also ask

1 Answers

Return only minutes with activity

Shortest

Fastest

Include minutes without activity

Shortest

Fastest

Erwin Brandstetter

Recent Activity

Donate For Us