Should Join Tables typically be created as Index Organized Tables (Clustered Indexes)?

Tags:

Generally speaking ... should join tables (i.e. associative tables) be created as Index Organized Tables (Oracle) , Clustered Indexes (SQL Server) .... or plain old heap tables (with separate indexes on the 2 columns).

The way I see if, the advantages are:

Speed improvement. You're avoiding a heap table look up.

Space Improvement. You're eliminating the heap table altogether, so you're probably saving ~30% space.

The disadvantages:

Index Skip Scan (only applies to Oracle) .. will be faster then a Full Table Scan, but slower then an Index Scan. So searches on the second column of the compound key will be slightly slower (Oracle), much slower (MSSQL).

A Full Index Scan will be slower then a Full Table Scan - so if most of the time the Cost Based Optimizer is doing Hash Joins (which don't take advantage of Indexes) ... you could expect worse performance. (Assuming that the RDBMS doesn't first filter the tables).

Which makes me question whether any type of indexes are really required for Join Tables, if you're predominately going to be doing Hash Joins.

913

asked Jan 01 '12 21:01

vicsz

2 Answers

My personal rule-of-thumb is to create two-table associative entities as index-organized-tables, with the primary key constraint being the access "direction" I expect to be more commonly used. I'll then generally add a unique index to cover reverse order of the keys, so in all cases the optimizer should be able to use unique-scan or range-scan access.

Three-table (or more) associative entities generally require significantly more analysis.

Also, the optimizer will use indexes with hash join operations; generally fast full scans, but indexes nonetheless.

166

answered Sep 20 '22 16:09

Adam Musch

I'd just list and talk through a few possible solutions, which hopefully will help you decide. A "union table" contains two or three columns. A foreign key to the left table, say a, and a foreign key to the right table, say b. The optional column is the row identity for the "union table", say id.

Solution 1: Columns a,b. No clustered index (a heap), indexes on (a,b) and (b,a)
Both columns are stored in three places. It supports seeks on both a and b, and the seek for b does not require a bookmark lookup, since a part of the (b,a) index. Decent choice, but the triple storage seems like a waste. The heap has no use but has to be maintained during insert and update queries.

Solution 2: Columns a, b. Clustered index on (a,b), index on (b,a)
All data is stored twice. Can serve seeks on a and b without a bookmark lookup. This would be the best practice approach. It trades disk storage for speed.

Solution 3: Columns a, b. Clustered index on (a,b)
All data is stored only once. It can serve a seek on a, but not on b. Going from the right to the left table will require a table scan. This trades speed for disk space. (Your question mentions hash join. A hash join always does a full scan.)

Solution 4: Columns id, a, b. Clustered index (id), index on (a) and (b)
Seeks on a or b both require a bookmark lookup. Both a and b are stored twice on disk, once in their own index and once in the clustered key. This is the worst solution I could think of.

This list is by no means exhaustive. Solution 2 would be a good default choice. I'd go for that unless another solution proved itself to be significantly better in tests.

answered Sep 18 '22 16:09

Andomar

Related questions
                            
                                SQL Server 2008 search double byte characters
                            
                                MSSQL: Unable to create relationships for two foreign keys to the same table?
                            
                                Aggregate function that selects the value of a column that corresponds to another successful aggregate functions match
                            
                                best practices to implement hashing?
                            
                                SQL Server non-clustered index design
                            
                                SQL Server WITH clause
                            
                                In T-SQL under MS SQL Server 2008, what does '@' mean in front of a parameter *value* that's a string literal?
                            
                                XQUERY - How to use the sql:variable in 'value()' function?
                            
                                Conditional sort order in SQL Server windowed function clauses
                            
                                SQL Server case/collation issue
                            
                                SQL Query for YTD, MTD, WTD totals
                            
                                SQL server join vs subquery performance question
                            
                                Why does linked view give different results from MS Access vs SQL Manager?
                            
                                Sql Server Agent job running longer than interval between executions
                            
                                Is there a way to prevent SQL Server from Validating the SQL in a stored procedure during CREATE / ALTER
                            
                                Audit for DISABLE/ENABLE Triggers IN SQL
                            
                                Entity Framework Query Results Duplicate
                            
                                Odd INNER JOIN syntax and encapsulation
                            
                                How to unique identify rows in a table without primary key
                            
                                Run a console application from SQL Server after table update trigger asynchronously?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Should Join Tables typically be created as Index Organized Tables (Clustered Indexes)?

Tags:

sql-server

indexing

clustered-index

oracle

vicsz

People also ask

2 Answers

Adam Musch

Andomar

Recent Activity

Donate For Us