I have a table of events that has a very similar schema and data distribution as this artificial table that can easily be generated locally:
CREATE TABLE events AS
WITH args AS (
SELECT
300 AS scale_factor, -- feel free to reduce this to speed up local testing
1000 AS pa_count,
1 AS l_count_min,
29 AS l_count_rand,
10 AS c_count,
10 AS pr_count,
3 AS r_count,
'10 days'::interval AS time_range -- edit 2017-05-02: the real data set has years worth of data here, but the query time ranges stay small (a couple days)
)
SELECT
p.c_id,
'ABC'||lpad(p.pa_id::text, 13, '0') AS pa_id,
'abcdefgh-'||((random()*(SELECT pr_count-1 FROM args)+1))::int AS pr_id,
((random()*(SELECT r_count-1 FROM args)+1))::int AS r,
'2017-01-01Z00:00:00'::timestamp without time zone + random()*(SELECT time_range FROM args) AS t
FROM (
SELECT
pa_id,
((random()*(SELECT c_count-1 FROM args)+1))::int AS c_id,
(random()*(SELECT l_count_rand FROM args)+(SELECT l_count_min FROM args))::int AS l_count
FROM generate_series(1, (SELECT pa_count*scale_factor FROM args)) pa_id
) p
JOIN LATERAL (
SELECT generate_series(1, p.l_count)
) l(id) ON (true);
Excerpt from SELECT * FROM events
:
What I need is a query that selects all rows for a given c_id
in a given time range of t
, then filters them down to only include the most recent rows (by t
) for each unique pr_id
and pa_id
combination, and then counts the number of pr_id
and r
combinations of those rows.
That's a quite a mouthful, so here are 3 SQL queries that I came up with that produce the desired results:
WITH query_a AS (
SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id,
pa_id,
r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
ORDER BY pr_id, pa_id, t DESC
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
),
query_b AS (
SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT
pr_id,
pa_id,
first_not_null(r ORDER BY t DESC) AS r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
GROUP BY
1,
2
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
),
query_c AS (
SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT
pr_id,
pa_id,
first_not_null(r) AS r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
GROUP BY
1,
2
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
)
And here is the custom aggregate function used by query_b
and query_c
, as well as what I believe to be the most optimal index, settings and conditions:
CREATE FUNCTION first_not_null_agg(before anyelement, value anyelement) RETURNS anyelement
LANGUAGE sql IMMUTABLE STRICT
AS $_$
SELECT $1;
$_$;
CREATE AGGREGATE first_not_null(anyelement) (
SFUNC = first_not_null_agg,
STYPE = anyelement
);
CREATE INDEX events_idx ON events USING btree (c_id, t DESC, pr_id, pa_id, r);
VACUUM ANALYZE events;
SET work_mem='128MB';
My dilemma is that query_c
outperforms query_a
and query_b
by a factor of > 6x, but is technically not guaranteed to produce the same result as the other queries (notice the missing ORDER BY
in the first_not_null
aggregate). However, in practice it seems to pick a query plan that I believe to be correct and most optimal.
Below are the EXPLAIN (ANALYZE, VERBOSE)
outputs for all 3 queries on my local machine:
query_a
:
CTE Scan on query_a (cost=25810.77..26071.25 rows=13024 width=44) (actual time=3329.921..3329.934 rows=30 loops=1)
Output: query_a.pr_id, query_a.r, query_a.quantity
CTE query_a
-> Sort (cost=25778.21..25810.77 rows=13024 width=23) (actual time=3329.918..3329.921 rows=30 loops=1)
Output: events.pr_id, events.r, (count(1))
Sort Key: (count(1)), events.r, events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=24757.86..24888.10 rows=13024 width=23) (actual time=3329.849..3329.892 rows=30 loops=1)
Output: events.pr_id, events.r, count(1)
Group Key: events.pr_id, events.r
-> Unique (cost=21350.90..22478.71 rows=130237 width=40) (actual time=3168.656..3257.299 rows=116547 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
-> Sort (cost=21350.90..21726.83 rows=150375 width=40) (actual time=3168.655..3209.095 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Sort Key: events.pr_id, events.pa_id, events.t DESC
Sort Method: quicksort Memory: 18160kB
-> Index Only Scan using events_idx on public.events (cost=0.56..8420.00 rows=150375 width=40) (actual time=0.038..101.584 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.316 ms
Execution time: 3331.082 ms
query_b
:
CTE Scan on query_b (cost=67140.75..67409.53 rows=13439 width=44) (actual time=3761.077..3761.090 rows=30 loops=1)
Output: query_b.pr_id, query_b.r, query_b.quantity
CTE query_b
-> Sort (cost=67107.15..67140.75 rows=13439 width=23) (actual time=3761.074..3761.081 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r ORDER BY events.t DESC)), (count(1))
Sort Key: (count(1)), (first_not_null(events.r ORDER BY events.t DESC)), events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=66051.24..66185.63 rows=13439 width=23) (actual time=3760.997..3761.049 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r ORDER BY events.t DESC)), count(1)
Group Key: events.pr_id, first_not_null(events.r ORDER BY events.t DESC)
-> GroupAggregate (cost=22188.98..63699.49 rows=134386 width=32) (actual time=2961.471..3671.669 rows=116547 loops=1)
Output: events.pr_id, events.pa_id, first_not_null(events.r ORDER BY events.t DESC)
Group Key: events.pr_id, events.pa_id
-> Sort (cost=22188.98..22578.94 rows=155987 width=40) (actual time=2961.436..3012.440 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Sort Key: events.pr_id, events.pa_id
Sort Method: quicksort Memory: 18160kB
-> Index Only Scan using events_idx on public.events (cost=0.56..8734.27 rows=155987 width=40) (actual time=0.038..97.336 rows=153795 loops=1)
Output: events.pr_id, events.pa_id, events.r, events.t
Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.385 ms
Execution time: 3761.852 ms
query_c
:
CTE Scan on query_c (cost=51400.06..51660.54 rows=13024 width=44) (actual time=524.382..524.395 rows=30 loops=1)
Output: query_c.pr_id, query_c.r, query_c.quantity
CTE query_c
-> Sort (cost=51367.50..51400.06 rows=13024 width=23) (actual time=524.380..524.384 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r)), (count(1))
Sort Key: (count(1)), (first_not_null(events.r)), events.pr_id DESC
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=50347.14..50477.38 rows=13024 width=23) (actual time=524.311..524.349 rows=30 loops=1)
Output: events.pr_id, (first_not_null(events.r)), count(1)
Group Key: events.pr_id, first_not_null(events.r)
-> HashAggregate (cost=46765.62..48067.99 rows=130237 width=32) (actual time=401.480..459.962 rows=116547 loops=1)
Output: events.pr_id, events.pa_id, first_not_null(events.r)
Group Key: events.pr_id, events.pa_id
-> Index Only Scan using events_idx on public.events (cost=0.56..8420.00 rows=150375 width=32) (actual time=0.027..109.459 rows=153795 loops=1)
Output: events.c_id, events.t, events.pr_id, events.pa_id, events.r
Index Cond: ((events.c_id = 5) AND (events.t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (events.t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Planning time: 0.296 ms
Execution time: 525.566 ms
Broadly speaking, I believe that the index above should allow query_a
and query_b
to be executed without the Sort nodes that slow them down, but so far I've failed to convince the postgres query optimizer to do my bidding.
I'm also somewhat confused about the t
column not being included in the Sort key for query_b
, considering that quicksort is not stable. It seems like this could yield the wrong results.
I've verified that all 3 queries generate the same results running the following queries and verifying they produce an empty result set:
SELECT * FROM query_a
EXCEPT
SELECT * FROM query_b;
and
SELECT * FROM query_a
EXCEPT
SELECT * FROM query_c;
I'd consider query_a
to be the canonical query when in doubt.
I greatly appreciate any input on this. I've actually found a terribly hacky workaround to achieve acceptable performance in my application, but this problem continues to hunt me in my sleep (and in fact vacation, which I'm currently on) ... 😬.
FWIW, I've looked at many similar questions and answers which have guided my current thinking, but I believe there is something unique about the two column grouping (pr_id
, pa_id
) and having to sort by a 3rd column (t
) that doesn't make this a duplicate question.
Edit: The outer queries in the example may be entirely irrelevant to the question, so feel free to ignore them if it helps.
[EDITED] Ok, As this depend of your data distribution here is another way to do it.
First add the following index :
CREATE INDEX events_idx2 ON events (c_id, t DESC, pr_id, pa_id, r);
This extract the MAX(t)
as quick as he can, on the assumption that the sub set will be way smaller to join on the parent table. It may however probably be slower if the dataset is not that small.
SELECT
e.pr_id,
e.r,
count(1) AS quantity
FROM events e
JOIN (
SELECT
pr_id,
pa_id,
MAX(t) last_t
FROM events e
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
GROUP BY
pr_id,
pa_id
) latest
ON (
c_id = 5
AND latest.pr_id = e.pr_id
AND latest.pa_id = e.pa_id
AND latest.last_t = e.t
)
GROUP BY
e.pr_id,
e.r
ORDER BY 3, 2, 1 DESC
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
--PostgreSQL 9.6
--'\\' is a delimiter
-- CREATE TABLE events AS...
VACUUM ANALYZE events;
CREATE INDEX idx_events_idx ON events (c_id, t DESC, pr_id, pa_id, r);
Query 1:
-- query A
explain analyze SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id,
pa_id,
r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
ORDER BY pr_id, pa_id, t DESC
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
Results:
QUERY PLAN
Sort (cost=2170.24..2170.74 rows=200 width=15) (actual time=358.239..358.245 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=2160.60..2162.60 rows=200 width=15) (actual time=358.181..358.189 rows=30 loops=1)
-> Unique (cost=2012.69..2132.61 rows=1599 width=40) (actual time=327.345..353.750 rows=12098 loops=1)
-> Sort (cost=2012.69..2052.66 rows=15990 width=40) (actual time=327.344..348.686 rows=15966 loops=1)
Sort Key: events.pr_id, events.pa_id, events.t
Sort Method: external merge Disk: 792kB
-> Index Only Scan using idx_events_idx on events (cost=0.42..896.20 rows=15990 width=40) (actual time=0.059..5.475 rows=15966 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Total runtime: 358.610 ms
Query 2:
-- query max/JOIN
explain analyze SELECT
e.pr_id,
e.r,
count(1) AS quantity
FROM events e
JOIN (
SELECT
pr_id,
pa_id,
MAX(t) last_t
FROM events e
WHERE
c_id = 5
AND t >= '2017-01-03Z00:00:00'
AND t < '2017-01-06Z00:00:00'
GROUP BY
pr_id,
pa_id
) latest
ON (
c_id = 5
AND latest.pr_id = e.pr_id
AND latest.pa_id = e.pa_id
AND latest.last_t = e.t
)
GROUP BY
e.pr_id,
e.r
ORDER BY 3, 2, 1 DESC
Results:
QUERY PLAN
Sort (cost=4153.31..4153.32 rows=1 width=15) (actual time=68.398..68.402 rows=30 loops=1)
Sort Key: (count(1)), e.r, e.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=4153.29..4153.30 rows=1 width=15) (actual time=68.363..68.371 rows=30 loops=1)
-> Merge Join (cost=1133.62..4153.29 rows=1 width=15) (actual time=35.083..64.154 rows=12098 loops=1)
Merge Cond: ((e.t = (max(e_1.t))) AND (e.pr_id = e_1.pr_id))
Join Filter: (e.pa_id = e_1.pa_id)
-> Index Only Scan Backward using idx_events_idx on events e (cost=0.42..2739.72 rows=53674 width=40) (actual time=0.010..8.073 rows=26661 loops=1)
Index Cond: (c_id = 5)
Heap Fetches: 0
-> Sort (cost=1133.19..1137.19 rows=1599 width=36) (actual time=29.778..32.885 rows=12098 loops=1)
Sort Key: (max(e_1.t)), e_1.pr_id
Sort Method: external sort Disk: 640kB
-> HashAggregate (cost=1016.12..1032.11 rows=1599 width=36) (actual time=12.731..16.738 rows=12098 loops=1)
-> Index Only Scan using idx_events_idx on events e_1 (cost=0.42..896.20 rows=15990 width=36) (actual time=0.029..5.084 rows=15966 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Total runtime: 68.736 ms
Query 3:
DROP INDEX idx_events_idx
CREATE INDEX idx_events_flutter ON events (c_id, pr_id, pa_id, t DESC, r)
Query 5:
-- query A + index by flutter
explain analyze SELECT
pr_id,
r,
count(1) AS quantity
FROM (
SELECT DISTINCT ON (pr_id, pa_id)
pr_id,
pa_id,
r
FROM events
WHERE
c_id = 5 AND
t >= '2017-01-03Z00:00:00' AND
t < '2017-01-06Z00:00:00'
ORDER BY pr_id, pa_id, t DESC
) latest
GROUP BY
1,
2
ORDER BY 3, 2, 1 DESC
Results:
QUERY PLAN
Sort (cost=2744.82..2745.32 rows=200 width=15) (actual time=20.915..20.916 rows=30 loops=1)
Sort Key: (count(1)), events.r, events.pr_id
Sort Method: quicksort Memory: 27kB
-> HashAggregate (cost=2735.18..2737.18 rows=200 width=15) (actual time=20.883..20.892 rows=30 loops=1)
-> Unique (cost=0.42..2707.20 rows=1599 width=40) (actual time=0.037..16.488 rows=12098 loops=1)
-> Index Only Scan using idx_events_flutter on events (cost=0.42..2627.25 rows=15990 width=40) (actual time=0.036..10.893 rows=15966 loops=1)
Index Cond: ((c_id = 5) AND (t >= '2017-01-03 00:00:00'::timestamp without time zone) AND (t < '2017-01-06 00:00:00'::timestamp without time zone))
Heap Fetches: 0
Total runtime: 20.964 ms
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With