Let's say we have a Product table, and Order table and a (junction table) ProductOrder. ProductOrder will have an ProductID and an OrderID. In most of our systems these tables also have an autonumber column called ID. What is the best practice for placing the primary key (and therefor clustered key)? <ul> <li>Should I keep the primary key of the ID field and create a non-clustered index for the foreign key pair (ProductID and OrderID)</li> <li>Or should I put the primary key of the foreign key pair (ProductID and OrderID) and put a non-clustered index on the ID column (if even necessary)</li> <li>Or ... (smart remark by one of you :))</li> </ul>

I know these words might make you cringe, but "it depends." It is most likely that you want the order to be based on the ProductID and/or OrderId and not the autonumber (surrogate) column since the autonumber has no natural meaning in your database. You probably want to order the join table by the same field as the parent table. <ol> <li>First understand why and how you are using the surrogate key ID in the first place; that will often dictate how you index it. I assume you are using the surrogate key because you are using some framework that works well with single column keys. If there is no specific design reason, then for a join table, I'd simplify the problem and just remove the autonumber ID, if it brings no other benefit. The primary key becomes the (ProductID, OrderID). If not, you need to at least make sure your index on the (ProductID, OrderID) tuple is unique to preserve data integrity.</li> <li>Clustered indexes are good for sequential scans/joins when the query needs the results in the same order that the index is ordered. So, look at your access patterns, figure out by which key(s) you will be doing sequential, multi-row selects / scans, and by which key you'll be doing random, individual row access, and create the clustered index on the key you'll scan most, and the non-clustered key index on the key you'll use for random access. You have to choose one or the other, since you cannot cluster both.</li> </ol> NOTE: If you have conflicting requirements, there is a technique ("trick") that may help. If all of the columns in a query are found in an index, then that index is a candidate table for the database engine to use to satisfy the requirements of the query. You can use this fact to store data in more than one order even if they are in conflict of one another. Just be aware of the pros and cons of adding more fields to an index, and make a conscious decision after understanding nature and frequency of queries that will be processed.

Primary Key / Clustered key for Junction Tables

1 Answers

I know these words might make you cringe, but "it depends."

It is most likely that you want the order to be based on the ProductID and/or OrderId and not the autonumber (surrogate) column since the autonumber has no natural meaning in your database. You probably want to order the join table by the same field as the parent table.

First understand why and how you are using the surrogate key ID in the first place; that will often dictate how you index it. I assume you are using the surrogate key because you are using some framework that works well with single column keys. If there is no specific design reason, then for a join table, I'd simplify the problem and just remove the autonumber ID, if it brings no other benefit. The primary key becomes the (ProductID, OrderID). If not, you need to at least make sure your index on the (ProductID, OrderID) tuple is unique to preserve data integrity.
Clustered indexes are good for sequential scans/joins when the query needs the results in the same order that the index is ordered. So, look at your access patterns, figure out by which key(s) you will be doing sequential, multi-row selects / scans, and by which key you'll be doing random, individual row access, and create the clustered index on the key you'll scan most, and the non-clustered key index on the key you'll use for random access. You have to choose one or the other, since you cannot cluster both.

NOTE: If you have conflicting requirements, there is a technique ("trick") that may help. If all of the columns in a query are found in an index, then that index is a candidate table for the database engine to use to satisfy the requirements of the query. You can use this fact to store data in more than one order even if they are in conflict of one another. Just be aware of the pros and cons of adding more fields to an index, and make a conscious decision after understanding nature and frequency of queries that will be processed.

142

answered Nov 02 '22 10:11

codenheim

Related questions
                            
                                How to combine multiple columns into one column?
                            
                                How to find a columns set for a primary key candidate in CSV file?
                            
                                PostgreSQL Window Functions
                            
                                How select last user information from user and usermeta tables?
                            
                                Spark Advanced Window with dynamic last
                            
                                How to show day name in SQL Server?
                            
                                More than one path to JOIN the same table in Postgres
                            
                                Error when foreign referencing in mySQL (Error 3780)
                            
                                SQL query to calculate coordinate proximity
                            
                                Is it possible to use/access scalar functions with LINQ to SQL?
                            
                                Recursive sql problem
                            
                                Count distinct and Null value is eliminated by an aggregate
                            
                                What does a "set+0" in an SQL statement do?
                            
                                SQL Server Stored Procedure Fails due to use of XML/ ANSI_NULLS, QUOTED_IDENTIFIER options
                            
                                Use Tablediff to compare all tables
                            
                                SQL argument limit in Oracle
                            
                                Regenerate GRANTs for roles across schemas
                            
                                SQL proc diagram generating software of a program flow [closed]
                            
                                Are SQL Server Database Ids always positive?
                            
                                When dropping a constraint will the supporting indexes also be dropped?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Primary Key / Clustered key for Junction Tables

Tags:

sql

clustered-index

junction-table

Zyphrax

People also ask

1 Answers

codenheim

Recent Activity

Donate For Us