If I had the following query:
select some cols
from tbl_a
INNER JOIN tbl_b ON tbl_a.orderNumber = tbl_b.orderNumber
where tlb_b.status = 'XX'
Assuming both tables have clustered indexes on order number only, would it be better from a performance perspective to extend the clustered index on table b to include the status column referenced in the where clause?
You extend tbl_b to add status after the orderNumber: create clustered index ... on tbl_b(orderNumber, status)
. For the query above there will be no noticeable difference. The plan will still have to scan tbl_b end to end and match every order number in tbl_a (probably a merge join).
You extend tbl_b to add status before the orderNumber: create clustered index ... on tbl_b (status, orderNumber)
. Now there is a HUGE difference. The plan can do a range scan on tbl_b to get only those with Status 'xx' and only match tbl_a for the corrsponding orderNumber, using a nested loop join.
Placing a low selectivity column (like 'status' usually is) as the leftmost key in an index is usually a good thing. And making a row like 'status' the leftmost column in a clustered index is also usually a good thing, because it groups records with same status together physically. Note that doing so will have an impact on all queries. You also loose the direct access by orderNumber if status is not specified, you'll have to add a non-clustered index on orderNumber alone to cover that (which is usualy the PK non-clustered index).
I made all these comments w/o knowing your actual data cardinality and selectivity. If the cardinality of tbl_a and tbl_b is very skewed then things may be different. Eg. if tbl_a has 10 records with 10 distinct order numbers and tbl_b has 10M records with 10M order numbers than my advice the option 2. would make little difference, since the plan will always choose a scan of tbl_a a seek range lookups in tbl_b 10 times.
Yes, quite possibly. This is called a covering index. The entire query can be served from the index, without accessing tbl_b at all.
However, you should consider the impact on performance of other queries, particularly ones that update the status column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With