Create sql indexes for complex filtering

Tags:

There is table in sql database human. I have ui for this table and filter form like this: enter image description here

I can set only some values (for instance age and state only). If filter item is not specified it won't be add to sql WHERE condition. WHERE condition is combined in order described on picture. So if I want to create indexes for all cases to get performance boost I need to create this indexes:

first name
last name
age
state
birthday
gender
first name + last name
first name + last name + age
first name + last name + age + state
...
state + birtday
state + birthday + gender
...
state + gender

It looks bad for me. Should I select only most used combinations? What do you think?

812

asked Jul 24 '16 13:07

mtkachenko

2 Answers

If you have the index first name + last name + age + state, you don't also need first name + last name + age and first name + last name and first name. If you have the index first name + last name + age + state and a user searches only on "first name" and "last name", the database will be able to use that index. As long as the user specifies columns in the same left-to-right order as your index, the database will be able to use the index even if every column isn't specified.

For instance, if you have the index first name + last name + age + state and the user specifies "first name" and "last name", then the database will be able to use that index to jump to the matching rows. However, if the user specifies "first name" and "age", or "first name" and "state", then the database will only partially use the index to jump to the rows with matching first names, but it will then have to do a scan for the rows that match "age" or "state". If you want to know the technical details behind why this is true, read about database indexes and B+ trees. This is a good explanation.

Databases can also use multiple indexes when running a single query. If you have the indexes

`last name`
`state`
`age`

And the user searches for "last name", "state", and "age", the database will be able to use all three indexes to quickly find the matching rows for each field, and then the results will be combined and rows that don't match all three indexes will not be selected. If you look at an execution plan, you'll be able to see it doing this. Granted this will be a tiny bit slower than having a single index that has every necessary field in it, but it will prevent you from having a ton of indexes.

Also note that even if an index exists, the database may not necessarily use that index because doing a row scan maybe faster. For instance, take the above example with three different indexes, and suppose the user does a search on "last name", "first name", and "state". Because the combination of "last name" and "first name" has a such a high selectivity (meaning most of the values in that index are unique), it might be faster to just use the index to get all the rows that match the first name and last name and then just do a simple iterative scan on those rows to find the ones that also have the matching state, than to use the state index as well, and then join the rows that were returned by both indexes.

When you're designing your indexes, an index won't give you much of a performance boost (and may actually be worse than doing a full table scan) if the selectivity of your index is really low. Gender, for instance, is not a good field to have indexed because you only have two possible values. If the user is searching only on gender, you will never get good performance with or without indexes because you will return half your rows.

Row-for-row, a full table scan is actually faster than using an index. The reason for this is that when the database does a table scan, it is able to jump straight to the data page on disk. When it uses an index, it has to go through a few intermediate index pages before it actually gets to where the data is stored on disk. For a field like "gender" where you're going to be selecting half of your rows, the added overhead of following your index links for half the rows in the table may outweigh the cost of just scanning the entire table without using indexes.

I would recommend indexes

`first name, last name`
`birthdate`
`state`

If you have a specific combination of fields that is searched on frequently, then you can make an index for that too to speed things up. However, don't make an index for every combination of fields.

If you use "birthdate" instead of "birthday", then you don't need "age" because you can calculate that based on "birthdate" and then do a between query on "birthdate". If you're forced to have separate columns for "birthday" and "age", then you could index "age" as well. However, like another user commented below, you'd have to constantly update your ages. I strongly recommend against that design.

One final thing to consider is whether to try to make a covering index. A covering index is one in which every field that the user searched for is part of your index. For example, suppose your table has 100 fields in it, but users are usually only interested in looking up someone's state and age based on their name. So a large percentage of your queries look something like this

SELECT STATE, AGE FROM PEOPLE WHERE FIRSTNAME = 'Homer' AND LASTNAME = 'Simpson'

If your index is LASTNAME, FIRSTNAME, then the database will look up "Homer" and "Simpson" in your index (which will involve reading a few index pages from disk), use the index pointer to go to the disk page where the data record is stored, read that entire data page, parse it into fields, and then return the state and age.

Now, suppose you run the same query but your index is LASTNAME, FIRSTNAME, STATE, AGE. The database engine will still use your index to look up "Homer" and "Simpson", but once it finds the appropriate index record (exactly the same as how it worked above), that index record already has STATE and AGE. Therefore, the database can get the results of your query straight from the index without having to also read the data page from disk.

A situation where a covering index can drastically improve performance is in the case of table scans. Assume you have 100 fields in your table (so the size of a single row is a few hundred bytes or more). Now a user runs the query

SELECT FIRSTNAME, LASTNAME, AGE FROM PEOPLE

The database would have to read the entire table (including all 100 fields which aren't necessary for this query) to get your results. If you had an index LASTNAME, FIRSTNAME, AGE, then the database could get the results by scanning your entire index instead of scanning the entire table. Since in this case a single index element is far smaller byte-wise than a single data row, the query will be much faster.

In your particular case with so few fields in your table, a covering index probably wouldn't be very useful since the fields in the index would be the same as the fields in your table, thus defeating the whole purpose. However, for a table with dozens of fields, of which only a handful are commonly queried, a covering index can be a great way to speed up your queries.

answered Sep 19 '22 16:09

Ben Rubin

Lots of indexes is a 'bad' idea.
Indexes on individual columns won't help much.
One index that is a 'prefix' of another is redundant.
An index on a flag or column of low 'cardinality' (eg gender), won't be used.

Suggestion: Start with one index per column. Then add on a second column to each index. Pick this second column based on what is likely to be tested together. Avoid having both (a,b) and (b,a)

Then watch what types of queries are generated by 'real' users. Tweak the list of indexes accordingly. This info may lead to a few 3-column indexes.

answered Sep 19 '22 16:09

Rick James

Related questions
                            
                                MySQL PDO prepared faster than query? That's what this simple test shows
                            
                                Replicate Microsoft SQL to other databases
                            
                                Cardinality violation on mysql query
                            
                                Best technique to store gender in MySQL Database
                            
                                What is the equivalent of bind_result on PDO
                            
                                Tried every thing still getting ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2) in mac
                            
                                py.test: ImportError: No module named mysql
                            
                                MySQL UNIQUE key not working
                            
                                How to make login form in node.js using mysql database
                            
                                Laravel 4: Where Not Exists
                            
                                Postgresql: Trying to get Average of Counts for the last 10 ten days
                            
                                How to specify collation with PDO without SET NAMES?
                            
                                XAMPP phpMyadmin: Access denied after changing password
                            
                                SQL create table and set auto increment value without Alter table
                            
                                Count number of unique characters in a string
                            
                                MySQL InnoDB "SELECT FOR UPDATE" - SKIP LOCKED equivalent
                            
                                mySQL auto increment increasing by 10 (ClearDB & Node)
                            
                                MySQL Invalid default value for timestamp when no default value is given.
                            
                                mysqldump: Error 2020: Got packet bigger than 'max_allowed_packet' bytes when dumping table
                            
                                Convert MySQL's POINT to text in PHP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Create sql indexes for complex filtering

Tags:

sql

sql-server

indexing

mysql

mtkachenko

People also ask

2 Answers

Ben Rubin

Rick James

Recent Activity

Donate For Us