Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need two indexes on a HABTM join table?

A simple has_and_belongs_to_many association:

Person has_and_belongs_to_many :products
Product has_and_belongs_to_many :persons

Are both of the following indexes helpful for optimal performance?

add_index :person_products, [:person_id, :product_id]
add_index :person_products, [:product_id, :person_id]
like image 326
sscirrus Avatar asked Mar 04 '13 20:03

sscirrus


2 Answers

Close - you most likely want the following:

add_index :person_products, [:person_id, :product_id], :unique => true
add_index :person_products, :product_id

The :unique => true is not strictly required and it depends whether or not it makes sense to have a person associated with a product multiple times. I would say if you're not sure, you probably do want the :unique flag.

The reason for the index structure is that all modern databases can execute queries on both person_id and product_id using the first index regardless of the order specified in a query. E.g.

SELECT foo FROM bar WHERE person_id = 1 AND product_id = 2
SELECT foo FROM bar WHERE product_id = 2 AND person_id = 1

are treated as the same and the database is smart enough to use the first index.

Likewise, queries using only person_id can also be run using the first index. Multi-column b-tree indexes can use fewer columns than they have provided they are specified from the left of the original declaration.

For queries using only product_id, this cannot be executed against the first index (since that index is defined with person_id in the leftmost position). Hence you need a separate index to enable lookups on that field alone.

The multi-column b-tree index property also extends to indexes with higher numbers of columns. If you had an index on (person_id, product_id, favorite_color, shirt_size), you could use that index to run queries using person_id, (person_id, product_id), etc, so long as the order matches the definition.

like image 110
Dave S. Avatar answered Nov 15 '22 09:11

Dave S.


Yes they are helpful. But do you really need them? It all depends on what you're gonna do with it. Index on (person_id,product_id) will allow you to quickly find products belonging to person but will not help finding persons that own certain product. It will also enforce UNIQUE so you probably should use it. separate indexes on (person_id) and (product_id) will allow you to find both products belonging to person and persons that own certain product. Indices on (person_id,product_id) and (product_id,person_id) will work for both cases too and will be faster but will take more space and there will take a little bit (very little) more when inserting/updating rows. The time and space overhead is almost always worth it unless you have a base where you write more often than read. Personally I've seen Index Only Scans in 9.2 benefit greatly from two indexes on both columns. So you the real choice is between:

unique index on (col 2, col 1), unique index on (col 1, col 2)

and

unique Index on (col 1, col 2), index on (col 2)

like image 4
Jakub Kania Avatar answered Nov 15 '22 10:11

Jakub Kania