I need to query for each minute the total count of rows up to that minute.
The best I could achieve so far doesn't do the trick. It returns count per minute, not the total count up to each minute:
SELECT COUNT(id) AS count , EXTRACT(hour from "when") AS hour , EXTRACT(minute from "when") AS minute FROM mytable GROUP BY hour, minute
The basic SQL standard query to count the rows in a table is: SELECT count(*) FROM table_name; This can be rather slow because PostgreSQL has to check visibility for all rows, due to the MVCC model.
PostgreSQL MIN() function is an aggregate function that returns the minimum value in a set of values. Syntax: MIN(expression); The MIN() function can be used with SELECT, WHERE and HAVING clause.
The PostgreSQL COUNT function counts a number of rows or non-NULL values against a specific column from a table. When an asterisk(*) is used with count function the total number of rows returns. The asterisk(*) indicates all the rows. This clause is optional.
By default, count query counts everything, including duplicates. Let's touch upon distinct , which is often used alongside count. This command uses an index-only scan but still takes around 3.5 seconds. The speed depends on many factors, including cardinality, size of the table, and whether the index is cached.
SELECT DISTINCT date_trunc('minute', "when") AS minute , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct FROM mytable ORDER BY 1;
Use date_trunc()
, it returns exactly what you need.
Don't include id
in the query, since you want to GROUP BY
minute slices.
count()
is typically used as plain aggregate function. Appending an OVER
clause makes it a window function. Omit PARTITION BY
in the window definition - you want a running count over all rows. By default, that counts from the first row to the last peer of the current row as defined by ORDER BY
. The manual:
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition start up through the current row's lastORDER BY
peer.
And that happens to be exactly what you need.
Use count(*)
rather than count(id)
. It better fits your question ("count of rows"). It is generally slightly faster than count(id)
. And, while we might assume that id
is NOT NULL
, it has not been specified in the question, so count(id)
is wrong, strictly speaking, because NULL values are not counted with count(id)
.
You can't GROUP BY
minute slices at the same query level. Aggregate functions are applied before window functions, the window function count(*)
would only see 1 row per minute this way.
You can, however, SELECT DISTINCT
, because DISTINCT
is applied after window functions.
ORDER BY 1
is just shorthand for ORDER BY date_trunc('minute', "when")
here.1
is a positional reference reference to the 1st expression in the SELECT
list.
Use to_char()
if you need to format the result. Like:
SELECT DISTINCT to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute , count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct FROM mytable ORDER BY date_trunc('minute', "when");
SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct FROM ( SELECT date_trunc('minute', "when") AS minute , count(*) AS minute_ct FROM tbl GROUP BY 1 ) sub ORDER BY 1;
Much like the above, but:
I use a subquery to aggregate and count rows per minute. This way we get 1 row per minute without DISTINCT
in the outer SELECT
.
Use sum()
as window aggregate function now to add up the counts from the subquery.
I found this to be substantially faster with many rows per minute.
@GabiMe asked in a comment how to get eone row for every minute
in the time frame, including those where no event occured (no row in base table):
SELECT DISTINCT minute, count(c.minute) OVER (ORDER BY minute) AS running_ct FROM ( SELECT generate_series(date_trunc('minute', min("when")) , max("when") , interval '1 min') FROM tbl ) m(minute) LEFT JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute) ORDER BY 1;
Generate a row for every minute in the time frame between the first and the last event with generate_series()
- here directly based on aggregated values from the subquery.
LEFT JOIN
to all timestamps truncated to the minute and count. NULL
values (where no row exists) do not add to the running count.
With CTE:
WITH cte AS ( SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct FROM tbl GROUP BY 1 ) SELECT m.minute , COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct FROM ( SELECT generate_series(min(minute), max(minute), interval '1 min') FROM cte ) m(minute) LEFT JOIN cte USING (minute) ORDER BY 1;
Again, aggregate and count rows per minute in the first step, it omits the need for later DISTINCT
.
Different from count()
, sum()
can return NULL
. Default to 0
with COALESCE
.
With many rows and an index on "when"
this version with a subquery was fastest among a couple of variants I tested with Postgres 9.1 - 9.4:
SELECT m.minute , COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct FROM ( SELECT generate_series(date_trunc('minute', min("when")) , max("when") , interval '1 min') FROM tbl ) m(minute) LEFT JOIN ( SELECT date_trunc('minute', "when") AS minute , count(*) AS minute_ct FROM tbl GROUP BY 1 ) c USING (minute) ORDER BY 1;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With