<p>With performance improvements in mind, I was wondering if and which indexes are helpful on a join table (specifically used in a Rails 3 has_and_belongs_to_many context).</p> <h3>Model and Table Setup</h3> <p>My models are <code>Foo</code> and <code>Bar</code> and per rails convention, I have a join table called <code>bars_foos</code>. There is no primary key or timestamps making the old fields in this table <code>bar_id:integer</code> and <code>foo_id:integer</code>. I'm interested in knowing which of the following indexes is best and is without duplication:</p> <ol> <li>A compound index: <code>add_index :bars_foos, [:bar_id, :foo_id]</code> </li> <li><ul> <li>Two indexes</li> <li>A. <code>add_index :bars_foos, :bar_id</code> </li> <li>B. <code>add_index :bars_foos, :foo_id</code> </li> </ul></li> <li>A combination of both 1 and 2-B</li> </ol> <p>Basically, I'm not sure if the compound index is enough assuming it is helpful to begin with. I believe that a compound index can be used as a single index for the first item which is why I am <em>pretty</em> sure that using all three lines would certainly result in unnecessary duplication.</p> <h3>Likely Usage</h3> <p>The most common usage will be given an instance of model <code>Foo</code>, I will be asking for its associated <code>bars</code> using the RoR syntax of <code>foo.bars</code> and vice versa with <code>bar.foos</code> for an instance of the model <code>Bar</code>.</p> <p>These will generate queries of the type <code>SELECT * FROM bars_foos WHERE foo_id = ?</code> and <code>SELECT * FROM bars_foos WHERE bar_id = ?</code> respectively and then using those resultant IDs to <code>SELECT * FROM bars WHERE ID in (?)</code> and <code>SELECT * FROM foos WHERE ID in (?)</code>.</p> <p>Please correct me in the comments if I am incorrect, but I do not believe that, in the context of the Rails application, it is ever going to try to do a query where it specifies both IDs like <code>SELECT * FROM bars_foos where bar_id = ? AND foo_id = ?</code>.</p> <h3>Databases</h3> <p>In the event there are database specific optimization techniques, I will most likely be using PostgreSQL. However, others using this code may want to use it in MySQL or SQLite depending on their Rails configuration so all answers are appreciated.</p>

<h3>The Answer</h3> <p>The oft repeated answer, which tends to always be the case more often than not is, "it depends." More specifically, it depends on what your data is and how it will be used.</p> <h3>tl;dr Explanation</h3> <p>The short tl;dr answer for my specific case (and to cover all future bases) is <strong>choice #2</strong> which is what I suspected. However, choice #3 would work just fine as, depending on my usage of the data, the extra time and space used creating the compound index could reduce future query lookups.</p> <h3>The Full Explanation</h3> <p>The reason for this is that databases try to be smart and try to do things as fast as possible regardless of programmer input. The most basic item to consider when adding an index is will this object be looked up by this key. If yes, an index can potentially help speed that up. However, whether this index is even used all comes down to selectivity and the cardinality of the field.</p> <p>Since foreign keys are typically the IDs of another AR class, cardinality usually will be high. But again, this depends on your data. In my example if there are many <code>Foo</code>s but few <code>Bar</code>s, many of the entries in my join table will have simliar <code>bar_id</code>s. With <code>bar_id</code>s having a low cardinality, an index on <code>bar_id</code> may never be used and may be getting in the way by having the database devote time and resources* to adding to this index every time a new <code>bars_foos</code> entry is created. The same goes with many <code>Bar</code>s and few <code>Foo</code>s and few of both.</p> <p>The general lesson is that when considering an index on a table, decide if the entries will be both looked up by this field and if this field has a high cardinality. That is, does this field have many distinct values? In the case of most join tables "it depends" and we must think more carefully about what the data represents and the relationships themselves. In my case, I will have <em>both</em> many <code>Foo</code>s and <code>Bar</code>s and will be looking up <code>Foo</code>s by their associated <code>bar</code>s and vice versa.</p> <p>Another good answer I got at the office was, "why are you worrying about your indexes? Build your app!"</p> <h3>Footnotes</h3> <p>* In a similar question on indexes on STI it was pointed out that the cost of an index is very low so when in doubt, just add it.</p>

Best SQL indexes for join table

Model and Table Setup

My models are Foo and Bar and per rails convention, I have a join table called bars_foos. There is no primary key or timestamps making the old fields in this table bar_id:integer and foo_id:integer. I'm interested in knowing which of the following indexes is best and is without duplication:

A compound index: add_index :bars_foos, [:bar_id, :foo_id]
- Two indexes
- A. add_index :bars_foos, :bar_id
- B. add_index :bars_foos, :foo_id
A combination of both 1 and 2-B

Basically, I'm not sure if the compound index is enough assuming it is helpful to begin with. I believe that a compound index can be used as a single index for the first item which is why I am pretty sure that using all three lines would certainly result in unnecessary duplication.

Likely Usage

The most common usage will be given an instance of model Foo, I will be asking for its associated bars using the RoR syntax of foo.bars and vice versa with bar.foos for an instance of the model Bar.

These will generate queries of the type SELECT * FROM bars_foos WHERE foo_id = ? and SELECT * FROM bars_foos WHERE bar_id = ? respectively and then using those resultant IDs to SELECT * FROM bars WHERE ID in (?) and SELECT * FROM foos WHERE ID in (?).

Please correct me in the comments if I am incorrect, but I do not believe that, in the context of the Rails application, it is ever going to try to do a query where it specifies both IDs like SELECT * FROM bars_foos where bar_id = ? AND foo_id = ?.

Databases

In the event there are database specific optimization techniques, I will most likely be using PostgreSQL. However, others using this code may want to use it in MySQL or SQLite depending on their Rails configuration so all answers are appreciated.

384

asked May 28 '12 21:05

Aaron

2 Answers

The Answer

The oft repeated answer, which tends to always be the case more often than not is, "it depends." More specifically, it depends on what your data is and how it will be used.

tl;dr Explanation

The short tl;dr answer for my specific case (and to cover all future bases) is choice #2 which is what I suspected. However, choice #3 would work just fine as, depending on my usage of the data, the extra time and space used creating the compound index could reduce future query lookups.

The Full Explanation

The reason for this is that databases try to be smart and try to do things as fast as possible regardless of programmer input. The most basic item to consider when adding an index is will this object be looked up by this key. If yes, an index can potentially help speed that up. However, whether this index is even used all comes down to selectivity and the cardinality of the field.

Since foreign keys are typically the IDs of another AR class, cardinality usually will be high. But again, this depends on your data. In my example if there are many Foos but few Bars, many of the entries in my join table will have simliar bar_ids. With bar_ids having a low cardinality, an index on bar_id may never be used and may be getting in the way by having the database devote time and resources* to adding to this index every time a new bars_foos entry is created. The same goes with many Bars and few Foos and few of both.

The general lesson is that when considering an index on a table, decide if the entries will be both looked up by this field and if this field has a high cardinality. That is, does this field have many distinct values? In the case of most join tables "it depends" and we must think more carefully about what the data represents and the relationships themselves. In my case, I will have both many Foos and Bars and will be looking up Foos by their associated bars and vice versa.

Another good answer I got at the office was, "why are you worrying about your indexes? Build your app!"

Footnotes

* In a similar question on indexes on STI it was pointed out that the cost of an index is very low so when in doubt, just add it.

answered Oct 04 '22 04:10

Aaron

Depends on how you are going to query the data.

Assuming you want to search for all of these...

WHERE bar_id = ?
WHERE foo_id = ?
WHERE bar_id = ? AND foo_id = ?

...then you should probably go with an index on {bar_id, foo_id} and an index on {foo_id}.

While you could also create a third index on {bar_id}, the price of maintaining additional index would probably outweigh the benefit of better clustering in the smaller index.

Also, how do you plan to cover your queries with indexes? Some of the alternatives, such as...

{foo_id, bar_id} and {bar_id}
{foo_id, bar_id} and {bar_id, foo_id}

...might cover certain kinds of queries better.

Covering is a balancing act - sometimes adding a field to an index just for covering purposes is justified, sometimes it's not. You won't know until you measure on realistic amounts of data.

(Disclaimer: I'm not familiar with Ruby. This answer is purely from the database perspective.)

answered Oct 04 '22 04:10

Branko Dimitrijevic

Related questions
                            
                                Exception Notification Gem and Rails 3
                            
                                How to get domain from email
                            
                                Turning off auto-complete for text fields in Firefox
                            
                                Ruby - Merge two arrays and remove values that have duplicate
                            
                                Is it possible to rename an index using a rails migration?
                            
                                Access current_user in model
                            
                                Can I install gems with apt-get on Ubuntu?
                            
                                Grouping by week/month/etc & ActiveRecord?
                            
                                Making the rails console output a little more pretty
                            
                                How to generate a random date in Ruby?
                            
                                Rails Active Admin css conflicting with Twitter Bootstrap css
                            
                                Adding icon to rails application
                            
                                Fall back to default language if translation missing
                            
                                Best way to create random DateTime in Rails
                            
                                Most useful Rails plugins, Ruby libraries and Ruby gems? [closed]
                            
                                How can I sign out a devise user from the Rails console?
                            
                                Accessing Redmine controllers through a plugin
                            
                                Using God to monitor Unicorn - Start exited with non-zero code = 1
                            
                                guidelines for where to put classes in Rails apps that don't fit anywhere
                            
                                Rails 5.1: "unknown firstpos: NilClass" - Issue reloading application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best SQL indexes for join table

Tags:

indexing

ruby-on-rails

database-design

rails-migrations

Model and Table Setup

Likely Usage

Databases

Aaron

People also ask

2 Answers

The Answer

tl;dr Explanation

The Full Explanation

Footnotes

Aaron

Branko Dimitrijevic

Recent Activity

Donate For Us