I have a database with 4 tables.
CREATE TABLE definition (
id bigserial PRIMARY KEY,
public_id uuid UNIQUE NOT NULL DEFAULT uuid_generate_v4(),
name varchar(128) NOT NULL,
type definition_type NOT NULL,
created_at timestamptz NOT NULL DEFAULT current_timestamp,
updated_at timestamptz NOT NULL DEFAULT current_timestamp,
created_by_token varchar(128) NOT NULL CHECK (created_by_token <> ''),
created_by_user varchar(128)
);
CREATE INDEX ON definition (type);
CREATE INDEX ON definition (created_at);
CREATE INDEX ON definition (updated_at);
CREATE UNIQUE INDEX unique_definition_name ON definition (name);
CREATE INDEX definition_lower_token_idx ON definition (lower(created_by_token));
CREATE INDEX definition_lower_user_idx ON definition (lower(created_by_user));
CREATE TABLE definition_translation (
id bigserial PRIMARY KEY,
language varchar(35) NOT NULL,
definition_id uuid NOT NULL REFERENCES definition(public_id) ON DELETE CASCADE,
created_at timestamptz NOT NULL
updated_at timestamptz NOT NULL
title varchar(64) NOT NULL CHECK (title <> ''),
template text NOT NULL CHECK (template <> ''),
redirect_to text
);
ALTER TABLE definition_translation
ADD CONSTRAINT unique_translation_by_definition_per_language UNIQUE (definition_id, language);
CREATE INDEX ON definition_translation (language);
CREATE INDEX ON definition_translation (definition_id);
CREATE INDEX ON definition_translation (created_at);
CREATE INDEX ON definition_translation (updated_at);
CREATE INDEX ON definition_translation USING gin (title gin_trgm_ops);
CREATE OR REPLACE FUNCTION get_definition_translation(did uuid, lang varchar, default_lang varchar)
RETURNS TABLE (
title varchar(64),
template text,
redirect_to text
) AS $$
BEGIN
RETURN QUERY
SELECT dt.title, dt.template, dt.redirect_to
FROM definition_translation dt JOIN
(values(1,lang),(2,default_lang)) AS lng(ord,code) ON (dt.language = lng.code)
WHERE dt.definition_id = did
ORDER BY lng.ord
LIMIT 1;
END
$$ LANGUAGE plpgsql;
CREATE TABLE broadcast (
id bigserial PRIMARY KEY,
public_id uuid UNIQUE NOT NULL DEFAULT uuid_generate_v4(),
definition_id uuid NOT NULL REFERENCES definition(public_id) ON DELETE CASCADE,
created_at timestamptz NOT NULL DEFAULT current_timestamp,
begin_at timestamptz NOT NULL,
expire_at timestamptz,
created_by_token varchar(128) NOT NULL CHECK (created_by_token <> ''),
created_by_user varchar(128),
replace_values jsonb
);
ALTER TABLE broadcast
ADD CONSTRAINT unique_broadcast_by_definition UNIQUE (definition_id);
CREATE INDEX ON broadcast (created_at);
CREATE INDEX ON broadcast (begin_at);
CREATE INDEX ON broadcast (expire_at);
CREATE INDEX broadcast_lower_token_idx ON broadcast (lower(created_by_token));
CREATE INDEX broadcast_lower_user_idx ON broadcast (lower(created_by_user));
CREATE TABLE broadcast_acknowledgement (
id bigserial PRIMARY KEY,
public_id uuid UNIQUE NOT NULL DEFAULT uuid_generate_v4(),
broadcast_id uuid NOT NULL REFERENCES broadcast(public_id) ON DELETE CASCADE,
account citext NOT NULL,
created_at timestamptz NOT NULL DEFAULT current_timestamp
);
ALTER TABLE broadcast_acknowledgement
ADD CONSTRAINT unique_ack_by_broadcast_per_account UNIQUE (broadcast_id, account);
CREATE INDEX ON broadcast_acknowledgement (account);
CREATE INDEX ON broadcast_acknowledgement (created_at);
The rationale behind this database schemas is to avoid having to create a broadcast row for each user account (milions of them), but instead, have a separate broadcast_acknowledgement
table that store the acknowledgement for each account by broadcast.
The get_definition_translation
function is inspired by this answer: https://stackoverflow.com/a/40018977/4709839
Each broadcast has a "model" stored in the definition
table, and each definition can have one or more translations definition_translation
.
I use this query to fetch all the broadcasts that are not acknowledged by an account. The query joins the definition type of the broadcast and localized columns title
and template
of the definition using a primary language and a fallback in case no translation exists for the primary language.
WITH acknowledged AS (
SELECT ba.broadcast_id
FROM broadcast_acknowledgement AS ba
WHERE ba.account = 'toto'
) SELECT
d.type,
dt.title,
dt.template,
COALESCE(dt.redirect_to, '') AS redirect_to,
src.replace_values,
src.begin_at,
'broadcast:' || src.public_id AS id
FROM broadcast AS src
JOIN definition AS d ON src.definition_id = d.public_id, get_definition_translation(d.public_id , 'fr_FR', 'en_US') AS dt
WHERE (src.public_id NOT IN (SELECT broadcast_id FROM acknowledged)
AND src.begin_at <= current_timestamp
AND src.expire_at > current_timestamp);
The query use a view to fetch all the broadcasts acknowledged by an account and use the rows to filter the selected broadcasts in the main query.
In the JOIN clause, the get_definition_translation
is responsible for returning the translation of the definition that is referenced by the broadcast. It takes a primary and fallback value for the language.
The query plan looks like:
Nested Loop (cost=851.39..7815.32 rows=250000 width=149) (actual time=1.400..12.740 rows=288 loops=1)
Output: d.type, dt.title, dt.template, COALESCE(dt.redirect_to, ''::text), src.replace_values, src.begin_at, ('broadcast:'::text || (src.public_id)::text)
CTE acknowledged
-> Bitmap Heap Scan on public.broadcast_acknowledgement ba (cost=6.18..823.06 rows=209 width=16) (actual time=0.159..0.495 rows=212 loops=1)
Output: ba.broadcast_id
Recheck Cond: (ba.account = 'toto'::citext)
Heap Blocks: exact=210
-> Bitmap Index Scan on broadcast_acknowledgement_account_idx (cost=0.00..6.13 rows=209 width=0) (actual time=0.136..0.136 rows=212 loops=1)
Index Cond: (ba.account = 'toto'::citext)
-> Hash Join (cost=28.08..117.01 rows=250 width=53) (actual time=1.284..3.521 rows=288 loops=1)
Output: src.replace_values, src.begin_at, src.public_id, d.type, d.public_id
Hash Cond: (d.public_id = src.definition_id)
-> Seq Scan on public.definition d (cost=0.00..77.04 rows=2504 width=20) (actual time=0.010..1.054 rows=2504 loops=1)
Output: d.id, d.public_id, d.name, d.type, d.created_at, d.updated_at, d.created_by_token, d.created_by_user
-> Hash (cost=24.95..24.95 rows=250 width=49) (actual time=1.253..1.253 rows=288 loops=1)
Output: src.replace_values, src.begin_at, src.public_id, src.definition_id
Buckets: 1024 Batches: 1 Memory Usage: 33kB
-> Seq Scan on public.broadcast src (cost=4.70..24.95 rows=250 width=49) (actual time=0.834..1.098 rows=288 loops=1)
Output: src.replace_values, src.begin_at, src.public_id, src.definition_id
Filter: ((NOT (hashed SubPlan 2)) AND (src.begin_at <= now()) AND (src.expire_at > now()))
Rows Removed by Filter: 212
SubPlan 2
-> CTE Scan on acknowledged (cost=0.00..4.18 rows=209 width=16) (actual time=0.162..0.702 rows=212 loops=1)
Output: acknowledged.broadcast_id
-> Function Scan on public.get_definition_translation dt (cost=0.25..10.25 rows=1000 width=96) (actual time=0.029..0.030 rows=1 loops=288)
Output: dt.title, dt.template, dt.redirect_to
Function Call: get_definition_translation(d.public_id, 'fr_FR'::character varying, 'en_US'::character varying)
Planning time: 0.621 ms
Execution time: 12.946 ms
(29 rows)
The database was populated with randomly generated data to test the performances of the query.
root=# SELECT COUNT(*) FROM broadcast;
count
-------
500
(1 row)
root=# SELECT COUNT(*) FROM broadcast_acknowledgement;
count
----------
15306826
(1 row)
root=# SELECT COUNT(*) FROM definition;
count
-------
2504
(1 row)
root=# SELECT COUNT(*) FROM definition_translation;
count
-------
47469
(1 row)
I would like to optimize the query to reduce the time taken by the Nested Loop and Function Scan. I cannot figure why the query planner has a bad rows count estimate for those two parts as well, and wonder if this is impacting performances.
Thanks
The misestimate is a consequence of the fact that functions are “black boxes” for the PostgreSQL optimizer (from PostgreSQL 12 on, the new feature "support functions" can mitigate that, but that requires writing C code).
To get a better estimate, tell the optimizer how many rows the function normally returns, so that it doesn't have to use its guessed number of 1000:
ALTER FUNCTION get_definition_translation(did uuid, lang varchar, default_lang varchar)
ROWS 1;
You'd be much better off if you didn't use a function, but substitute the actual query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With