Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle: does the column order matter in an index?

Tags:

An index on two columns can be created with either of the statements

create index foo_ix on foo(a,b); create index foo_ix on foo(b,a); 
  1. How does this affect the operational (runtime) characteristics of using the index?

  2. How does this affect the layout (physical) characteristics of the index?

  3. Are either (1) or (2) affected by the types/sizes of the columns?

  4. What are the best practices for creating multi-column indexes?

In short, does it matter which column I put first?

like image 290
Mark Harrison Avatar asked Feb 04 '10 00:02

Mark Harrison


People also ask

Does the order of included columns in an index matter?

Selectivity of the individual columns in a composite index does not matter when picking the order. Here is the simple thought process: Effectively, an index is the concatenation of the columns involved.

Should ORDER BY column be indexed?

Using ORDER BY on indexed column is not a good idea. Actually the purpose of using index is to making searching faster so the index column helps to maintain the data in sorted order.

Do indexes affect ORDER BY?

Yes, index will help you, when using ORDER BY. Because INDEX is a sorted data structure, so the request will be executed faster. Look at this example: table test2 with 3 rows.

What should the order of columns be in the composite index?

Execution is most efficient when you create a composite index with the columns in order from most to least distinct. In other words, the column that returns the highest count of distinct rows when queried with the DISTINCT keyword in the SELECT statement should come first in the composite index.


2 Answers

  1. If a and b both have 1000 distinct values and they are always queried together then the order of columns in the index doesn't really matter. But if a has only 10 distinct values or you have queries which use just one of the columns then it does matter; in these scenarios the index may not be used if the column ordering does not suit the query.
  2. The column with the least distinct values ought to be first and the column with the most distinct values last. This not only maximises the utility of the index it also increases the potential gains from index compression.
  3. The datatype and length of the column have an impact on the return we can get from index compression but not on the best order of columns in an index.
  4. Arrange the columns with the least selective column first and the most selective column last. In the case of a tie lead with the column which is more likely to be used on its own.

The one potential exception to 2. and 3. is with DATE columns. Because Oracle DATE columns include a time element they might have 86400 distinct values per day. However most queries on a data column are usually only interested in the day element, so you might want to consider only the number of distinct days in your calculations. Although I suspect it won't affect the relative selectivity in but a handful of cases.

edit (in response to Nick Pierpoint's comment)

The two main reasons for leading with the least selective column are

  1. Index compression
  2. Index Skip reads

Both these work their magic from knowing that the value in the current slot is the same as the value in the previous slot. Consequently we can maximize the return from these techniques by minimsing the number of times the value changes. In the following example, A has four distinct values and B has six. The dittos represent a compressible value or a skippable index block.

Least selective column leads ...  A          B ---------  - AARDVARK   1 "          2 "          3 "          4 "          5 "          6 DIFFVAL    1 "          2 "          3 "          4 "          5 "          6 OTHERVAL   1 "          2 "          3 "          4 "          5 "          6 WHATEVER   1 "          2 "          3 "          4 "          5 "          6 

Most selective column leads ...

B  A -  -------- 1  AARDVARK "  DIFFVAL "  OTHERVAL "  WHATEVER 2  AARDVARK "  DIFFVAL "  OTHERVAL "  WHATEVER 3  AARDVARK "  DIFFVAL "  OTHERVAL "  WHATEVER 4  AARDVARK "  DIFFVAL "  OTHERVAL "  WHATEVER 5  AARDVARK "  DIFFVAL "  OTHERVAL "  WHATEVER 6  AARDVARK "  DIFFVAL "  OTHERVAL "  WHATEVER 

Even in this trival example, (A, B) has 20 skippable slots compared to the 18 of (B, A). A wider disparity would generate greater ROI on index compression or better utility from Index Skip reads.

As is the case with most tuning heuristics we need to benchmark using actual values and realistic volumes. This is definitely a scenario where data skew could have a dramatic impact of the effectiveness of different approaches.


"I think if you have a highly selective first index then - from a performance perspective - you'll do well to put it first."

If we have a highly selective column then we should build it an index of its own. The additional benefits of avoiding a FILTER operation on a handful of rows is unlikely to be outweighed by the overhead of maintaining a composite index.

Multi-column indexes are most useful when we have:

  • two or more columns of middling selectivity,
  • which are frequently used in the same query.
like image 53
APC Avatar answered Sep 18 '22 01:09

APC


But according to Oracle itself, it's better to put the column with the highest cardinality first:

http://docs.oracle.com/cd/B10500_01/server.920/a96533/data_acc.htm#2174

Ordering Keys for Composite Indexes

If all keys are used in WHERE clauses equally often, then ordering these keys from most selective to least selective in the CREATE INDEX statement best improves query performance.

like image 37
DVr Avatar answered Sep 19 '22 01:09

DVr