Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the order of records in a table with a composite primary key

In PostgreSQL, when a combination of multiple columns is specified as the PRIMARY KEY, how are the records ordered?

This is with the assumption that PostgreSQL orders the records in the order of the primary key. Does it?

Also, is the primary key automatically indexed in case of PostgreSQL?

like image 702
Abhishek Jain Avatar asked Nov 02 '12 06:11

Abhishek Jain


People also ask

Does order matter in composite primary key?

Answers. The order of primary key columns isn't a performance consideration for queries that specify all columns of the composite primary key. However, column order is very important for queries that specify only a subset of the columns.

Does order matter with Superkey?

If you need to only enforce uniqueness the order doesn't matter so the key (a, b) will have the same effect of (b, a) .

What is a composite primary key in SQL?

Composite key, or composite primary key, refers to cases where more than one column is used to specify the primary key of a table. In such cases, all foreign keys will also need to include all the columns in the composite key. Note that the columns that make up a composite key can be of different data types.

When would you use a composite primary key in a table?

A Composite Primary Key is created by combining two or more columns in a table that can be used to uniquely identify each row in the table when the columns are combined, but it does not guarantee uniqueness when taken individually, or it can also be understood as a primary key created by combining two or more ...


1 Answers

This question makes the misguided assumption that the primary key imposes a table order at all. It doesn't. PostgreSQL tables have no defined order, with or without a primary key; they're a "heap" of rows arranged in page blocks. Ordering is imposed using the ORDER BY clause of queries when desired.

You might be thinking that PostgreSQL tables are stored as index-oriented tables that're stored on disk in primary key order, but that isn't how Pg works. I think InnoDB stores tables organized by the primary key (but haven't checked), and it's optional in some other vendors' databases using a feature often called "clustered indexes" or "index-organized tables". This feature isn't currently supported by PostgreSQL (as of 9.3 at least).

That said, the PRIMARY KEY is implemented using a UNIQUE index, and there is an ordering to that index. It is sorted in ascending order from the left column of the index (and therefore the primary key) onward, as if it were ORDER BY col1 ASC, col2 ASC, col3 ASC;. The same is true of any other b-tree (as distinct from GiST or GIN) index in PostgreSQL, as they're implemented using b+trees.

So in the table:

CREATE TABLE demo (
   a integer,
   b text, 
   PRIMARY KEY(a,b)
);

the system will automatically create the equivalent of:

CREATE UNIQUE INDEX demo_pkey ON demo(a ASC, b ASC);

This is reported to you when you create a table, eg:

regress=>     CREATE TABLE demo (
regress(>        a integer,
regress(>        b text, 
regress(>        PRIMARY KEY(a,b)
regress(>     );
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "demo_pkey" for table "demo"
CREATE TABLE

You can see this index when examining the table:

regress=> \d demo
     Table "public.demo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | not null
 b      | text    | not null
Indexes:
    "demo_pkey" PRIMARY KEY, btree (a, b)

You can CLUSTER on this index to re-order the table according to the primary key, but it's a one-time operation. The system won't maintain that ordering - though if there's space free in the pages due to a non-default FILLFACTOR I think it will try to.

One consequence of the inherent ordering of the index (but not the heap) is that it is much faster to search for:

SELECT * FROM demo ORDER BY a, b;
SELECT * FROM demo ORDER BY a;

than:

SELECT * FROM demo ORDER BY a DESC, b;

and neither of these can use the primary key index at all, they'll do a seqscan unless you have an index on b:

SELECT * FROM demo ORDER BY b, a;
SELECT * FROM demo ORDER BY b;

This is becaues PostgreSQL can use an index on (a,b) almost as fast as an index on (a) alone. It cannot use an index on (a,b) as if it were an index on (b) alone - not even slowly, it just can't.

As for the DESC entry, for that one Pg must do a reverse index scan, which is slower than an ordinary forward index scan. If you're seeing lots of reverse index scans in EXPLAIN ANALYZE and you can afford the performance cost of the extra index you can create an index on the field in DESC order.

This is true for WHERE clauses, not just ORDER BY. You can use an index on (a,b) to search for WHERE a = 4 or WHERE a = 4 AND b = 3 but not to search for WHERE b = 3 alone.

like image 70
Craig Ringer Avatar answered Sep 19 '22 12:09

Craig Ringer