Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Order of multi column index in rails polymorphic association

I have a table with a polymorphic reference that I generated via the following migration:

def change                  
  add_reference :table_name, :thing, polymorphic: true, index: true
end

When I ran the migration it generated the following:

add_index "workflow_engine_task_bases", ["thing_type", "thing_id"], name: "index_workflow_engine_task_bases_on_thing_type_and_thing_id", using: :btree

Why is the left-most column the thing_type? To me it seems that this is suboptimal as it is less specific.

like image 645
user2977636 Avatar asked Dec 23 '22 20:12

user2977636


1 Answers

This is the commit in Rails by Derek Prior that updates add_reference to use type before id when generating an index for a polymorphic association. The justification for the change is reproduced below:

Use type column first in multi-column indexes

add_reference can very helpfully add a multi-column index when you use it to add a polymorphic reference. However, the first column in the index is the id column, which is less than ideal.

The [PostgreSQL docs][1] say:

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.

The [MySQL docs][2] say:

MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.

In a polymorphic relationship, the type column is much more likely to be useful as the first column in an index than the id column. That is, I'm more likely to query on type without an id than I am to query on id without a type.

[1]: http://www.postgresql.org/docs/9.3/static/indexes-multicolumn.html

[2]: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

I think in most scenarios, this order makes sense. It allows you to have a single index that performs well on queries including type and id or just type.

Having said that, your use case may vary depending on the database you use, your dataset, and what queries you plan to run. Your best bet is to profile your most common use cases on a production dump and choose your indexing strategy accordingly.

like image 144
O-I Avatar answered Mar 16 '23 18:03

O-I