Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should I use a composite index?

People also ask

What is composite index used for?

Key Takeaways A composite index is a statistical tool that groups together many different equities, securities, or indexes in order to create a representation of overall market or sector performance. Composite indexes are used to conduct investment analyses, measure economic trends, and forecast market activity.

How do you choose composite index?

Choose the Order of Columns in Composite Indexes In general, you should put the column expected to be used most often first in the index. You can create a composite index (using several columns), and the same index can be used for queries that reference all of these columns, or just some of them.

What is the difference between composite index and single index?

Like a single index, a composite index is also a data structure of records sorted on something. But unlike a single index, that something is not a field, but a concatenation of multiple fields. position = 'Top'; will have improved retrieval time, because the composite index is sorted by class-position .

What is the use of composite index in MySQL?

MySQL allows the user to create a composite index which can consist of up to 16 columns. The query optimizer uses the composite indexes for queries which will test all columns in the index. It can also be used for queries which will test the first columns, the first two columns, and so on.


You should use a composite index when you are using queries that benefit from it. A composite index that looks like this:

index( column_A, column_B, column_C )

will benefit a query that uses those fields for joining, filtering, and sometimes selecting. It will also benefit queries that use left-most subsets of columns in that composite. So the above index will also satisfy queries that need

index( column_A, column_B, column_C )
index( column_A, column_B )
index( column_A )

But it will not (at least not directly, maybe it can help partially if there are no better indices) help for queries that need

index( column_A, column_C )

Notice how column_B is missing.

In your original example, a composite index for two dimensions will mostly benefit queries that query on both dimensions or the leftmost dimension by itself, but not the rightmost dimension by itself. If you're always querying two dimensions, a composite index is the way to go, doesn't really matter which is first (most probably).


Imagine you have the following three queries:

Query I:

SELECT * FROM homes WHERE `geolat`=42.9 AND `geolng`=36.4

Query II:

SELECT * FROM homes WHERE `geolat`=42.9

Query III:

SELECT * FROM homes WHERE `geolng`=36.4

If you have seperate index per column, all three queries use indexes. In MySQL, if you have composite index (geolat, geolng), only query I and query II (which is using the first part of the composit index) uses indexes. In this case, query III requires full table search.

On Multiple-Column Indexes section of manual, it is clearly explained how multiple column indexes work, so I don't want to retype manual.

From the MySQL Reference Manual page:

A multiple-column index can be considered a sorted array containing values that are created by concatenating the values of the indexed columns.

If you use seperated index for geolat and geolng columns, you have two different index in your table which you can search independent.

INDEX geolat
-----------
VALUE RRN
36.4  1
36.4  8
36.6  2
37.8  3
37.8  12
41.4  4

INDEX geolng
-----------
VALUE RRN
26.1  1
26.1  8
29.6  2
29.6  3
30.1  12
34.7  4

If you use composite index you have only one index for both columns:

INDEX (geolat, geolng)
-----------
VALUE      RRN
36.4,26.1  1
36.4,26.1  8
36.6,29.6  2
37.8,29.6  3
37.8,30.1  12
41.4,34.7  4

RRN is relative record number (to simplify, you can say ID). The first two index generated seperate and the third index is composite. As you can see you can search based on geolng on composite one since it is indexed by geolat, however it's possible to search by geolat or "geolat AND geolng" (since geolng is second level index).

Also, have a look at How MySQL Uses Indexes manual section.


There could be a misconception about what composite index does. Many people think that composite index can be used to optimise a search query as long as the where clause covers the indexed columns, in your case geolat and geolng. Let's delve deeper:

I believe your data on the coordinates of homes would be random decimals as such:

home_id  geolat  geolng
   1    20.1243  50.4521
   2    22.6456  51.1564
   3    13.5464  45.4562
   4    55.5642 166.5756
   5    24.2624  27.4564
   6    62.1564  24.2542
...

Since geolat and geolng values hardly repeat itself. A composite index on geolat and geolng would look something like this:

index_id  geolat  geolng
   1     20.1243  50.4521
   2     20.1244  61.1564
   3     20.1251  55.4562
   4     20.1293  66.5756
   5     20.1302  57.4564
   6     20.1311  54.2542
...

Therefore the second column of the composite index is basically useless! The speed of your query with a composite index is probably going to be similar to an index on just the geolat column.

As mentioned by Will, MySQL provides spatial extension support. A spatial point is stored in a single column instead of two separate lat lng columns. Spatial index can be applied to such a column. However, the efficiency could be overrated based on my personal experience. It could be that spatial index does not resolve the two dimensional problem but merely speed up the search using R-Trees with quadratic splitting.

The trade-off is that a spatial point consumes much more memory as it used eight-byte double-precision numbers for storing coordinates. Correct me if I am wrong.


Composite indexes are useful for

  • 0 or more "=" clauses, plus
  • at most one range clause.

A composite index cannot handle two ranges. I discuss this further in my index cookbook.

Find nearest -- If the question is really about optimizing

WHERE geolat BETWEEN ??? AND ???
  AND geolng BETWEEN ??? AND ???

then no index can really handle both dimensions.

Instead, one must 'think out of the box'. If one dimension is implemented via partitioning and the other is implemented by carefully picking the PRIMARY KEY, one can get significantly better efficiency for very large tables of lat/lng lookup. My latlng blog goes into the details of how to implement "find nearest" on the globe. It includes code.

The PARTITIONs are stripes of latitude ranges. The PRIMARY KEY deliberately starts with longitude so that the useful rows are likely to be in the same block. A Stored Routine orchestrates the messy code for doing order by... limit... and for growing the 'square' around the target until you have enough coffee shops (or whatever). It also takes care of the great-circle calculations and handling the dateline and poles.

More

I have written another blog; it compares 5 ways of doing lat/lng searches: http://mysql.rjweb.org/doc.php/latlng#representation_choices (It references the link given above as one of the 5.) One of the other ways is this, and it points out that they are optimal for the particular case:

INDEX(geolat, geolng),
INDEX(geolng, geolat)

That is, having both columns in two indexes, and not having single-column indexes on geolat and geolng is important.