Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does indexing improve performance when joining two tables

Tags:

sql

In one of our practice papers we are asked the question in the title.

Most of the articles I've read say that indexing improves performance of joins, but does not tell me how.

Maybe it's so obvious that it doesn't need to be stated. Indexing is essentially ordering a column right? So I guess having a column in order makes it easier to operate on. Is there more to it? Or am I overthinking it?

Thanks

like image 659
h33 Avatar asked Nov 04 '18 08:11

h33


3 Answers

There are different join algorithms in use.

A nested loops join operates by

  • For each row in an outer table
  • Find matching rows in an inner table

An index on the inner table is helpful to allow the matching rows to be efficiently found and thus avoiding having to scan the whole thing for each row from the outer table.

A merge join requires two inputs sorted by the column(s) used in the join predicate. An index can provide this order without the necessity to sort first.

like image 73
Martin Smith Avatar answered Nov 09 '22 12:11

Martin Smith


Here is a simplified discussion of the answer.

In most relational database implementations, the physical order of the rows assumes the order those rows were inserted in. So if you have a products table and you insert products with keys 1, 8, 2, 3, 12, chances are that the records will physically be stored in that order. When you run a SQL query to get the rows, you may get the rows in yet possibly a different order unless you specify ORDER BY productKey Ascending. The ordering takes place before the result is presented to you and hence takes time. For large tables, it takes a lot of time.

When you create an index on a column, the database creates a physically separate store for the indexed values. This store, has the entries placed in sorted order (either ascending or descending) as @marc_s commented. The entries are added to that store as you insert rows.

In the above example, the index will contain entries in this physical order: 1,2,3,8 and 12.

This index structure provides several benefits for queries:

  1. The structure is much smaller than the corresponding data structure, this makes scanning the entire index less demanding on storage.

  2. The entries are sorted, so responding to requests with ORDER BY returning multiple rows is straight forward and does nor require further sorting.

  3. The entries are sorted again, this helps finding a specific key value in the index structure. If the records are not sorted you would need an average of N/2 comparisons before you get a hit, whereas if the the index is used you would need about log(N) comparisons only depending on the algorithm utilized (see for example: Wiki-Binary Search.

When you perform a query on the database involving a column that has an index, the database engine employs an optimization algorithm that tells it if is good to use the index structure or not and then selects the best approach to get your data accordingly.

Indexes are not all good. Some drawbacks:

  1. Index structure take space

  2. Index maintenance takes processing time.

  3. Index entries need to be created with every insert, update or delete.

  4. The more indexes you create, the less speedy the insert processing becomes. Indexes slows performance of mass inserts, it is usually advised to drop and index before you load a table and build it after the load is complete.

  5. In some databases, index structure can get corrupted.

  6. Index performance depends on key data type and length. Indexing a 5000 character column is not very good idea, whereas integers are very efficient.

  7. Not all queries can be served well by indexes.

In summary, an index behaves very much like the "old" phone book ordered by name, you could locate a person's number if you know the name quickly. However, they have some drawbacks. In real life, for large tables, they are a must-have and the DBA is the person to consult of how to make them efficient as there are many types of indexes too.

Some references for you:

Database index - How they work.

Anatomy of SQL Index

like image 32
NoChance Avatar answered Nov 09 '22 12:11

NoChance


Most used are B-tree based indexes. From Oracle Database Online Documentation 12c:

B-trees, short for balanced trees, are the most common type of database index. A B-tree index is an ordered list of values divided into ranges. By associating a key with a row or range of rows, B-trees provide excellent retrieval performance for a wide range of queries, including exact match and range searches.

enter image description here

like image 20
Hasan Alizada Avatar answered Nov 09 '22 10:11

Hasan Alizada