In PostgreSQL, when a combination of multiple columns is specified as the <code>PRIMARY KEY</code>, how are the records ordered? This is with the assumption that PostgreSQL orders the records in the order of the primary key. Does it? Also, is the primary key automatically indexed in case of PostgreSQL?

This question makes the misguided assumption that the primary key imposes a table order at all. It doesn't. PostgreSQL tables have no defined order, with or without a primary key; they're a "heap" of rows arranged in page blocks. Ordering is imposed using the <code>ORDER BY</code> clause of queries when desired. You might be thinking that PostgreSQL tables are stored as index-oriented tables that're stored on disk in primary key order, but that isn't how Pg works. I think InnoDB stores tables organized by the primary key (but haven't checked), and it's optional in some other vendors' databases using a feature often called "clustered indexes" or "index-organized tables". This feature isn't currently supported by PostgreSQL (as of 9.3 at least). That said, the <code>PRIMARY KEY</code> is implemented using a <code>UNIQUE</code> index, and there is an ordering to that index. It is sorted in ascending order from the left column of the index (and therefore the primary key) onward, as if it were <code>ORDER BY col1 ASC, col2 ASC, col3 ASC;</code>. The same is true of any other b-tree (as distinct from GiST or GIN) index in PostgreSQL, as they're implemented using b+trees. So in the table: <pre class="prettyprint"><code>CREATE TABLE demo ( a integer, b text, PRIMARY KEY(a,b) ); </code></pre> the system will automatically create the equivalent of: <pre class="prettyprint"><code>CREATE UNIQUE INDEX demo_pkey ON demo(a ASC, b ASC); </code></pre> This is reported to you when you create a table, eg: <pre class="prettyprint"><code>regress=> CREATE TABLE demo ( regress(> a integer, regress(> b text, regress(> PRIMARY KEY(a,b) regress(> ); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "demo_pkey" for table "demo" CREATE TABLE </code></pre> You can see this index when examining the table: <pre class="prettyprint"><code>regress=> \d demo Table "public.demo" Column | Type | Modifiers --------+---------+----------- a | integer | not null b | text | not null Indexes: "demo_pkey" PRIMARY KEY, btree (a, b) </code></pre> You can <code>CLUSTER</code> on this index to re-order the table according to the primary key, but it's a one-time operation. The system won't maintain that ordering - though if there's space free in the pages due to a non-default <code>FILLFACTOR</code> I think it will try to. One consequence of the inherent ordering of the index (but not the heap) is that it is much faster to search for: <pre class="prettyprint"><code>SELECT * FROM demo ORDER BY a, b; SELECT * FROM demo ORDER BY a; </code></pre> than: <pre class="prettyprint"><code>SELECT * FROM demo ORDER BY a DESC, b; </code></pre> and neither of these can use the primary key index at all, they'll do a seqscan unless you have an index on <code>b</code>: <pre class="prettyprint"><code>SELECT * FROM demo ORDER BY b, a; SELECT * FROM demo ORDER BY b; </code></pre> This is becaues PostgreSQL can use an index on <code>(a,b)</code> almost as fast as an index on <code>(a)</code> alone. It cannot use an index on <code>(a,b)</code> as if it were an index on <code>(b)</code> alone - not even slowly, it just can't. As for the <code>DESC</code> entry, for that one Pg must do a reverse index scan, which is slower than an ordinary forward index scan. If you're seeing lots of reverse index scans in <code>EXPLAIN ANALYZE</code> and you can afford the performance cost of the extra index you can create an index on the field in <code>DESC</code> order. This is true for <code>WHERE</code> clauses, not just <code>ORDER BY</code>. You can use an index on <code>(a,b)</code> to search for <code>WHERE a = 4</code> or <code>WHERE a = 4 AND b = 3</code> but not to search for <code>WHERE b = 3</code> alone.

What is the order of records in a table with a composite primary key

1 Answers

This question makes the misguided assumption that the primary key imposes a table order at all. It doesn't. PostgreSQL tables have no defined order, with or without a primary key; they're a "heap" of rows arranged in page blocks. Ordering is imposed using the ORDER BY clause of queries when desired.

You might be thinking that PostgreSQL tables are stored as index-oriented tables that're stored on disk in primary key order, but that isn't how Pg works. I think InnoDB stores tables organized by the primary key (but haven't checked), and it's optional in some other vendors' databases using a feature often called "clustered indexes" or "index-organized tables". This feature isn't currently supported by PostgreSQL (as of 9.3 at least).

That said, the PRIMARY KEY is implemented using a UNIQUE index, and there is an ordering to that index. It is sorted in ascending order from the left column of the index (and therefore the primary key) onward, as if it were ORDER BY col1 ASC, col2 ASC, col3 ASC;. The same is true of any other b-tree (as distinct from GiST or GIN) index in PostgreSQL, as they're implemented using b+trees.

So in the table:

CREATE TABLE demo (
   a integer,
   b text, 
   PRIMARY KEY(a,b)
);

the system will automatically create the equivalent of:

CREATE UNIQUE INDEX demo_pkey ON demo(a ASC, b ASC);

This is reported to you when you create a table, eg:

regress=>     CREATE TABLE demo (
regress(>        a integer,
regress(>        b text, 
regress(>        PRIMARY KEY(a,b)
regress(>     );
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "demo_pkey" for table "demo"
CREATE TABLE

You can see this index when examining the table:

regress=> \d demo
     Table "public.demo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | not null
 b      | text    | not null
Indexes:
    "demo_pkey" PRIMARY KEY, btree (a, b)

You can CLUSTER on this index to re-order the table according to the primary key, but it's a one-time operation. The system won't maintain that ordering - though if there's space free in the pages due to a non-default FILLFACTOR I think it will try to.

One consequence of the inherent ordering of the index (but not the heap) is that it is much faster to search for:

SELECT * FROM demo ORDER BY a, b;
SELECT * FROM demo ORDER BY a;

than:

SELECT * FROM demo ORDER BY a DESC, b;

and neither of these can use the primary key index at all, they'll do a seqscan unless you have an index on b:

SELECT * FROM demo ORDER BY b, a;
SELECT * FROM demo ORDER BY b;

This is becaues PostgreSQL can use an index on (a,b) almost as fast as an index on (a) alone. It cannot use an index on (a,b) as if it were an index on (b) alone - not even slowly, it just can't.

As for the DESC entry, for that one Pg must do a reverse index scan, which is slower than an ordinary forward index scan. If you're seeing lots of reverse index scans in EXPLAIN ANALYZE and you can afford the performance cost of the extra index you can create an index on the field in DESC order.

This is true for WHERE clauses, not just ORDER BY. You can use an index on (a,b) to search for WHERE a = 4 or WHERE a = 4 AND b = 3 but not to search for WHERE b = 3 alone.

answered Sep 19 '22 12:09

Craig Ringer

Related questions
                            
                                Rails + Postgres migration - why am I receiving the error "PG::UndefinedFunction: ERROR: function gen_random_uuid() does not exist"?
                            
                                PostgreSQL parameterized Order By / Limit in table function
                            
                                Postgresql: optimizing columns size for numeric fields
                            
                                Postgres locale error
                            
                                Why even use *DB.exec() or prepared statements in Golang?
                            
                                How to connect to windows postgres Database from WSL
                            
                                Referring to dynamic columns in a postgres query?
                            
                                How to count setof / number of keys of JSON in postgresql?
                            
                                psql.exe - password authentication failed in windows
                            
                                How to restore a single table from a .sql postgresql backup?
                            
                                Install libpq-dev on Mac OS X
                            
                                PostgreSQL - how should I use first_value()?
                            
                                SQL update records with ROW_NUMBER()
                            
                                How to do Bulk insert using Sequelize and node.js
                            
                                Select whole column content in pgAdmin
                            
                                Clean way to use postgresql window functions in django ORM?
                            
                                Calculate sunrise and sunset times for a given GPS coordinate within PostgreSQL
                            
                                How to check if a row exists in a PostgreSQL stored procedure?
                            
                                JDBCTemplate set nested POJO with BeanPropertyRowMapper
                            
                                Get size of large object in PostgreSQL query?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the order of records in a table with a composite primary key

Tags:

sorting

postgresql

primary-key

Abhishek Jain

People also ask

1 Answers

Craig Ringer

Recent Activity

Donate For Us