Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does adding an index to a database field speed up searching over that field?

I'm new to databases and have been reading that adding an index to a field you need to search over can dramatically speed up search times. I understand this reality, but am curious as to how it actually works. I've searched a bit on the subject, but haven't found any good, concise, and not over technical answer to how it works.

I've read the analogy of it being like an index at the back of a book, but in the case of a data field of unique elements (such as e-mail addresses in a user database), using the back of the book analogy would provide the same linear look up time as a non-indexed seach.

What is going on here to speed up search times so much? I've read a little bit about searching using B+-Trees, but the descriptions were a bit too indepth. What I'm looking for is a high level overview of what is going on, something to help my conceptual understanding of it, not technical details.

like image 675
Scott Lemmon Avatar asked Sep 27 '12 01:09

Scott Lemmon


People also ask

How does indexing make search faster?

Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.

How do indexes affect database performance?

An index is used to speed up data search and SQL query performance. The database indexes reduce the number of data pages that have to be read in order to find the specific record. The biggest challenge with indexing is to determine the right ones for each table.

What is the purpose of adding indexes to the database?

Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

When you add an index to a field in a database table How are performance and storage affected?

Inserting data into the table with additional indexes was four times slower. This is a simple bulk test and other factors can contribute to the slower speed; however, this provides a representative example that adding indexes to a table has a direct effect on write performance.


2 Answers

Expanding on the search algorithm efficiencies, a key area in database performance is how fast the data can be accessed. In general, reading data from a disk is a lot lot slower than reading data from memory.

To illustrate a point, lets assume everything is stored on disk. If you need to search through every row of data in a table looking for certain values in a field, you still need to read the entire row of data from the disk to see if it matches - this is commonly referred to as a 'table scan'.

If your table is 100MB, that's 100MB you need to read from disk.

If you now index the column you want to search on, in simplistic terms the index will store each unique value of the data and a reference to the exact location of the corresponding full row of data. This index may now only be 10MB compared to 100MB for the entire table.

Reading 10MB of data from the disk (and maybe a bit extra to read the full row data for each match) is roughly 10 times faster than reading the 100MB.

Different databases will store indexes or data in memory in different ways to make these things much faster. However, if your data set is large and doesn't fit in memory then the disk speed can have a huge impact and indexing can show huge gains. In memory there can still be large performance gains (amongst other efficiencies).

In general, that's why you may not notice any tangible difference with indexing a small dataset which easily fits in memory.

The underlying details will vary between systems and actually will be a lot more complicated, but I've always found the disk reads vs. memory reads an easily understandable way of explaining this.

like image 65
Jarod Elliott Avatar answered Nov 27 '22 09:11

Jarod Elliott


Okay, after a bit of research and discussion, here is what I have learned:

Conceptually an index is a sorted copy of the data field it is indexing, where each index value points to it's original(unsorted) row. Because the database knows how values are sorted, it can apply more sophisticated search algorithms than just looking for the value from start to finish. The binary search algorithm is a simple example of a searching algorithm for sorted lists and reduces the maximum search time from O(n) to O(log n).

As a side note: A decent sorting algorithm generally will take O(n log n) to complete, which means (as we've all probably heard before) you should only put indexes on fields you will search often, as it's a bit more expensive to add the index (which includes a sort) than it is to do a full search a few times. For example, in a large database of >1,000,000 entries it's on the range of 20x more expensive to sort than to search once.

Edit: See @Jarod Elliott's answer for a more in-depth look at search efficiencies, specifically in regard to read from disk operations.

like image 34
Scott Lemmon Avatar answered Nov 27 '22 10:11

Scott Lemmon