Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Indexes in MySQL

I am trying to understand indexes in MySQL. I know that an index created in a table can speed up executing queries and it can slow down the inserting and updating of rows.

When creating an index, I used this query on a table called authors that contains (AuthorNum, AuthorFName, AuthorLName, ...)

Create index Index_1 on Authors ([What to put here]);

I know I have to put a column name, but which one?

Do I have to put the column name that will be compared in the Where statement when a user query the Table or what?

like image 530
MrCSharp Avatar asked Oct 30 '13 12:10

MrCSharp


2 Answers

The Anatomy of an Index

An index is a distinct data structure within a database and is data redundancy. Its primary purpose is to provide an ordered representation of the indexed data through a logical ordering which is independent of the physical ordering. We do this using a doubly linked list and a tree structure known as the balanced search tree (B-tree). B-trees are nice because they keep data sorted and allow searches, access, insertions, and deletions in logarithmic time. Because of the doubly linked list, we are able to go backwards or forwards as needed on the index for various queries easily. Inserts become simple since we only have to rearrange pointers to the different pieces of data. Databases use these doubly linked list to connect leaf nodes (usually in a B+ tree or B-tree), each of which are stored in a page, and to establish logical ordering between the leaf nodes. Operations like UPDATE or INSERT become slower because they are actually two writing operations in the filesystem (one for the table data and one for the index data).

Defining an Optimal Index With WHERE

To define an optimal index you must not only understand how indexes work, but you must also understand how the application queries the data. E.g., you must know the column combinations that appear in the WHERE clause.

A common restriction with queries on LAST_NAME and FIRST_NAME columns deals with case sensitivity. For example, instead of doing an exact search like Hotinger we would prefer to match all results such as HoTingEr and so on. This is very easy to do in a WHERE clause: we just say WHERE UPPER(LAST_NAME) = UPPER('Hotinger')

However, if we define an index of LAST_NAME and query, it will actually run a full table scan because the query is not on LAST_NAME but on UPPER(LAST_NAME). From the database's perspective, this is completely different. So, in this case you should define the index on UPPER(LAST_NAME) instead.

Indexes do not necessarily have to be for one column. For example, if the primary key is a composite key (consisting of multiple columns) it will create a concatenated index also known as a combined index. Note that the ordering of the concatenated index has a significant impact on its usability and scalability so it must be chosen carefully. Basically, the ordering should match the way it is ordered in the WHERE clause.

Defining an Optimal Index With LIKE

The position of the wildcard characters makes a huge difference. LIKE clauses only use the characters before the wildcard during tree traversal; the rest do not narrow the scanned index range. The more selective the prefix of the LIKE clause the more narrow the scanned index becomes. This makes the index lookup faster. As a tip, avoid LIKE clauses which lead with wildcards like "%OTINGER%" For full-text searches, MySQL offers MATCH and AGAINST keywords. Starting with MySQL 5.6, you can have full-text indexes. Look at Full-Text Search Functions from MySQL for more in-depth discussion on indexing these results.

like image 108
Eric Hotinger Avatar answered Sep 24 '22 20:09

Eric Hotinger


Yes, generally you need an index on the column or columns that you compare in the WHERE clause of your queries to speed up queries.

If you search by AuthorFName, then you create an index on that column. If they search by AuthorLName, then you create an index on that column.

In this case though, maybe what you should be looking at is a FULLTEXT index. That would allow users to enter fuzzy queries, which would return a number of results ordered by relevance.

From the MySQL Manual:

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially. If you need to access most of the rows, it is faster to read sequentially, because this minimizes disk seeks.

like image 20
Gustav Bertram Avatar answered Sep 25 '22 20:09

Gustav Bertram