I've been impressed by the performance improvements achieved with clustering, but not with how long it takes. I know clustering needs to be rebuilt if a table or partition is changed after the clustering, but unless I've made a note of when I last clustered a table, how can I tell when I need to do it again? I can use this query to tell me what table(s) have one or more clustered indexes <pre class="prettyprint"><code>SELECT * FROM pg_class c JOIN pg_index i ON i.indrelid = c.oid WHERE relkind = 'r' AND relhasindex AND i.indisclustered </code></pre> My questions are. <ul> <li>How can I tell which indexes have been clustered?</li> <li>Is there any way of finding out exactly when a table was last clustered?</li> <li>How can I tell if a clustered index is still 'valid', or in other words, how can tell how much a table/index has changed enough that I need to re-build the cluster.</li> </ul> I've noticed that it takes just as long to re-build a clustered index as it does to build it in the first place (even if the table hasn't been touched in the meantime). So I want to avoid re-clustering unless I know the table needs it. <hr> UPDATE for clarity (I hope) If I use this command.... <pre class="prettyprint"><code>CLUSTER tableA USING tableA_idx1; </code></pre> <ul> <li>How can I find out at a later date which index was referenced i.e. tableA_idx1 (the table has multiple indexes defined)?</li> <li>Is it recorded anywhere when this command was run?</li> <li>I know that the cluster may need to be rebuilt/refreshed/recreated (not sure of the correct phraseology) occasionally using CLUSTER tableA when the table undergoes changes. Is there anyway of knowing when the table has changed so much that the clustering no longer helps?</li> </ul>

To tell which index was last used to cluster the table, use the <code>pg_index</code> system catalog. Query the table for all indexes that belong to your table and see which one has <code>indisclustered</code> set. A table can only be clustered by a single index at a time. There is no way to find out when the table was last clustered, but that's not very interesting anyway. What you want to know is how good the clustering still is. To find that, query the <code>pg_stats</code> line for the column on which you clustered. If <code>correlation</code> is close to 1, you are still good. The smaller the value gets, the more clustering is indicated.

How to tell when a Postgres table was clustered and what indexes were used

Tags:

postgresql

query-performance

I've been impressed by the performance improvements achieved with clustering, but not with how long it takes.

I know clustering needs to be rebuilt if a table or partition is changed after the clustering, but unless I've made a note of when I last clustered a table, how can I tell when I need to do it again?

I can use this query to tell me what table(s) have one or more clustered indexes

SELECT *
FROM   pg_class c
JOIN   pg_index i ON i.indrelid = c.oid
WHERE  relkind = 'r' AND relhasindex AND i.indisclustered

My questions are.

How can I tell which indexes have been clustered?
Is there any way of finding out exactly when a table was last clustered?
How can I tell if a clustered index is still 'valid', or in other words, how can tell how much a table/index has changed enough that I need to re-build the cluster.

I've noticed that it takes just as long to re-build a clustered index as it does to build it in the first place (even if the table hasn't been touched in the meantime). So I want to avoid re-clustering unless I know the table needs it.

UPDATE for clarity (I hope)

If I use this command....

CLUSTER tableA USING tableA_idx1;

How can I find out at a later date which index was referenced i.e. tableA_idx1 (the table has multiple indexes defined)?
Is it recorded anywhere when this command was run?
I know that the cluster may need to be rebuilt/refreshed/recreated (not sure of the correct phraseology) occasionally using CLUSTER tableA when the table undergoes changes. Is there anyway of knowing when the table has changed so much that the clustering no longer helps?

750

asked Nov 14 '18 11:11

ConanTheGerbil

1 Answers

To tell which index was last used to cluster the table, use the pg_index system catalog.

Query the table for all indexes that belong to your table and see which one has indisclustered set. A table can only be clustered by a single index at a time.

There is no way to find out when the table was last clustered, but that's not very interesting anyway. What you want to know is how good the clustering still is.

To find that, query the pg_stats line for the column on which you clustered. If correlation is close to 1, you are still good. The smaller the value gets, the more clustering is indicated.

answered Oct 07 '22 01:10

Laurenz Albe

Related questions
                            
                                Error in PostgreSQL: right sibling's left-link doesn't match: block 5 links to 8 instead of expected 2 in index "pg_toast_2619_index"
                            
                                Postgres permission denied for relation <table>
                            
                                How do I change a time column to an integer column in PostgreSQL with Rails?
                            
                                Trap specific named unique constraint exception
                            
                                Dynamic ORDER BY and ASC / DESC in a plpgsql function
                            
                                Postgresql: multicolumn indexes vs single column index
                            
                                How to query jsonb arrays with IN operator
                            
                                PgAdmin4 enable debugging
                            
                                ERROR: column "increment_by" does not exist - Postgres 10 and Rails 5
                            
                                Changing sqlite to PostgreSQL in ruby on rails
                            
                                How to model diamond like many-to-many relationship in database ERD
                            
                                How to install the cube function for Postgresql
                            
                                Using files generated by other recipes in Yocto
                            
                                Python Pony ORM Insert multiple values at once
                            
                                Combining postgres query and log duration
                            
                                Materialized view using a function using a temporary table
                            
                                PostgreSQL row read lock
                            
                                What specific exceptions represent a serialization failure when Django is using serializable transaction isolation level with postgresql?
                            
                                conditional updating in postgreSQL
                            
                                Sequelize - does not contain a string within a PostgreSQL array query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With