I've got a PostgreSQL table called queries_query
, which has many columns.
Two of these columns, created
and user_sid
, are frequently used together in SQL queries by my application to determine how many queries a given user has done over the past 30 days. It is very, very rare that I query these stats for any time older than the most recent 30 days.
Here is my question:
I've currently created my multi-column index on these two columns by running:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
But I'd like to further restrict the index to only care about those queries in which the created date is within the past 30 days. I've tried doing the following:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
WHERE created >= NOW() - '30 days'::INTERVAL`
But this throws an exception stating that my function must be immutable.
I'd love to get this working so that I can optimize my index, and cut back on the resources Postgres needs to do these repeated queries.
You get an exception using now()
because the function is not IMMUTABLE
(obviously) and, quoting the manual:
All functions and operators used in an index definition must be "immutable" ...
I see two ways to utilize a (much more efficient) partial index:
CREATE INDEX queries_recent_idx ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp;
Assuming created
is actually defined as timestamp
. It wouldn't work to provide a timestamp
constant for a timestamptz
column (timestamp with time zone
). The cast from timestamp
to timestamptz
(or vice versa) depends on the current time zone setting and is not immutable. Use a constant of matching data type. Understand the basics of timestamps with / without time zone:
Drop and recreate that index at hours with low traffic, maybe with a cron job on a daily or weekly basis (or whatever is good enough for you). Creating an index is pretty fast, especially a partial index that is comparatively small. This solution also doesn't need to add anything to the table.
Assuming no concurrent access to the table, automatic index recreation could be done with a function like this:
CREATE OR REPLACE FUNCTION f_index_recreate()
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
DROP INDEX IF EXISTS queries_recent_idx;
EXECUTE format('
CREATE INDEX queries_recent_idx
ON queries_query (user_sid, created)
WHERE created > %L::timestamp'
, LOCALTIMESTAMP - interval '30 days'); -- timestamp constant
-- , now() - interval '30 days'); -- alternative for timestamptz
END
$func$;
Call:
SELECT f_index_recreate();
now()
(like you had) is the equivalent of CURRENT_TIMESTAMP
and returns timestamptz
. Cast to timestamp
with now()::timestamp
or use LOCALTIMESTAMP
instead.
db<>fiddle here
Old sqlfiddle
If you have to deal with concurrent access to the table, use DROP INDEX CONCURRENTLY
and CREATE INDEX CONCURRENTLY
. But you can't wrap these commands into a function because, per documentation:
... a regular
CREATE INDEX
command can be performed within a transaction block, butCREATE INDEX CONCURRENTLY
cannot.
So, with two separate transactions:
CREATE INDEX CONCURRENTLY queries_recent_idx2 ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp; -- your new condition
Then:
DROP INDEX CONCURRENTLY IF EXISTS queries_recent_idx;
Optionally, rename to old name:
ALTER INDEX queries_recent_idx2 RENAME TO queries_recent_idx;
Add an archived
tag to your table:
ALTER queries_query ADD COLUMN archived boolean NOT NULL DEFAULT FALSE;
UPDATE
the column at intervals of your choosing to "retire" older rows and create an index like:
CREATE INDEX some_index_name ON queries_query (user_sid, created)
WHERE NOT archived;
Add a matching condition to your queries (even if it seems redundant) to allow it to use the index. Check with EXPLAIN ANALYZE
whether the query planner catches on - it should be able to use the index for queries on an newer date. But it won't understand more complex conditions not matching exactly.
You don't have to drop and recreate the index, but the UPDATE
on the table may be more expensive than index recreation and the table gets slightly bigger.
I would go with the first option (index recreation). In fact, I am using this solution in several databases. The second incurs more costly updates.
Both solutions retain their usefulness over time, performance slowly deteriorates as more outdated rows are included in the index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With