I'm reading all about how clustered indexes work, and think they would be beneficial to my app. I understand that primary keys are automatically clustered indexes, but how would you add a clustered index to a non-primary key column?
I.e. a datastore for user posts. Each post has a ID, but also has a user-id, but since users can post multiple times, the user-id is not a primary key. How would you add a clustered index to the user-id, and is that even a good idea?
When you define a primary key for an InnoDB table, MySQL uses the primary key as the clustered index. If you do not have a primary key for a table, MySQL will search for the first UNIQUE index where all the key columns are NOT NULL and use this UNIQUE index as the clustered index.
In Object Explorer, expand the table on which you want to create a clustered index. Right-click the Indexes folder, point to New Index, and select Clustered Index.... In the New Index dialog box, on the General page, enter the name of the new index in the Index name box.
A clustered index (SQL Server, MySQL/InnoDB) is a table stored in an index B-Tree structure. There is no second data structure (heap-table) for the table. A non-clustered index is an index that refers to another data structure containing further table columns.
Can I create Clustered index without Primary key? Yes, you can create. The main criteria is that the column values should be unique and not null. Indexing improves the performance in case of huge data and has to be mandatory for quick retrieval of data.
According to Clustered and Secondary Indexes, you can have only one clustered index per table.
All indexes other than the clustered index are known as secondary indexes.
If a table has no primary index but another unique index, this is used as the clustered index.
If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
So, I would conclude, that you don't add a clustered index yourself, but MySQL chooses either the primary or the first unique index of a table as the clustered index.
If you haven't defined a primary or unique index, MySQL creates an index itself
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
As @Olaf describes, InnoDB chooses which column or column combination will be the clustered index (the primary key, or the first unique index if there is not a primary key, or a hidden column if there is none of the two).
If you want to have a non-unique column as the clustered index, you could define the post_id
as a unique key and make the combination of user_id
and post_id
the primary key which will be chosen as the clustered index:
CREATE TABLE Post
( post_id INT NOT NULL AUTO_INCREMENT
, user_id INT NOT NULL
--- other columns
, CONSTRAINT Post_PK
PRIMARY KEY (user_id, post_id) -- your clustered index
, CONSTRAINT post_id_UQ
UNIQUE (post_id) -- you still want uniqueness for the `post_id`
) ENGINE = InnoDB ;
Whether this is a good idea or not depends on your application, the data volumes and the queries you have. In general the best properties of a clustered key are unique, narrow, static and ever-increasing. That's why auto-incrementing columns are best. Read about it in Kimberly L. Tripp's blog articles: Ever-increasing clustering key - the Clustered Index Debate..........again! and The Clustered Index Debate Continues... (don't stop because they are for SQL-Server, the same issues apply 99% to InnoDB's clustered indexing)
A clustered key like the (user_id, post_id)
has the first 3 properties but it is not ever-increasing. This will result in fragmentation of the CI and possibly slower insertions into the table.
It will however result in more efficient queries that have WHERE user_id = ?
conditions or range conditions WHERE user_id BETWEEN ? AND ?
or GROUP BY user_id
groupings as the wanted data will be found in the clustered index in one place and in the required order.
I suggest you make tests to choose which is best in your case.
There is also a MySQL variant, TokuDB, that allows multiple clustered indexes in a table. Details in their article: Introducing Multiple Clustering Indexes
When you define a primary key for an InnoDB table, MySQL uses the primary key as the clustered index.
If you do not have a primary key for a table, MySQL will search for the first UNIQUE index where all the key columns are NOT NULL and use this UNIQUE index as the clustered index.
In case the InnoDB table has no primary key or suitable UNIQUE index, MySQL internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column which contains the row ID values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With