Let's say we have a Product table, and Order table and a (junction table) ProductOrder.
ProductOrder will have an ProductID and an OrderID.
In most of our systems these tables also have an autonumber column called ID.
What is the best practice for placing the primary key (and therefor clustered key)?
Should I keep the primary key of the ID field and create a non-clustered index for the foreign key pair (ProductID and OrderID)
Or should I put the primary key of the foreign key pair (ProductID and OrderID) and put a non-clustered index on the ID column (if even necessary)
Or ... (smart remark by one of you :))
A junction table contains the primary key columns of the two tables you want to relate. You then create a relationship from the primary key columns of each of those two tables to the matching columns in the junction table. In the pubs database, the titleauthor table is a junction table.
By default a primary keys gets to be the clustered index key too, but this is not a requirement. The primary key is a logical concept: is the key used in your data model to reference entities. The clustered index key is a physical concept: is the order in which you want the rows to be stored on disk.
We can apply a Primary Key constraint and a Clustered Index constraint to different columns in the same table or to the same column. It's a common practice to apply a Clustered Index to a Primary Key. Since the Primary Key is often used to connect data, it's frequently used in searches.
If your primary key is of the UNIQUEIDENTIFIER , make sure to specify that it's NONCLUSTERED . If you make it clustered, every insert will have to do a bunch of shuffling of records to insert the new row in the correct position. This will tank performance.
I know these words might make you cringe, but "it depends."
It is most likely that you want the order to be based on the ProductID and/or OrderId and not the autonumber (surrogate) column since the autonumber has no natural meaning in your database. You probably want to order the join table by the same field as the parent table.
First understand why and how you are using the surrogate key ID in the first place; that will often dictate how you index it. I assume you are using the surrogate key because you are using some framework that works well with single column keys. If there is no specific design reason, then for a join table, I'd simplify the problem and just remove the autonumber ID, if it brings no other benefit. The primary key becomes the (ProductID, OrderID). If not, you need to at least make sure your index on the (ProductID, OrderID) tuple is unique to preserve data integrity.
Clustered indexes are good for sequential scans/joins when the query needs the results in the same order that the index is ordered. So, look at your access patterns, figure out by which key(s) you will be doing sequential, multi-row selects / scans, and by which key you'll be doing random, individual row access, and create the clustered index on the key you'll scan most, and the non-clustered key index on the key you'll use for random access. You have to choose one or the other, since you cannot cluster both.
NOTE: If you have conflicting requirements, there is a technique ("trick") that may help. If all of the columns in a query are found in an index, then that index is a candidate table for the database engine to use to satisfy the requirements of the query. You can use this fact to store data in more than one order even if they are in conflict of one another. Just be aware of the pros and cons of adding more fields to an index, and make a conscious decision after understanding nature and frequency of queries that will be processed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With