Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create sql indexes for complex filtering

There is table in sql database human. I have ui for this table and filter form like this:enter image description here

I can set only some values (for instance age and state only). If filter item is not specified it won't be add to sql WHERE condition. WHERE condition is combined in order described on picture. So if I want to create indexes for all cases to get performance boost I need to create this indexes:

  • first name
  • last name
  • age
  • state
  • birthday
  • gender
  • first name + last name
  • first name + last name + age
  • first name + last name + age + state
  • ...
  • state + birtday
  • state + birthday + gender
  • ...
  • state + gender

    It looks bad for me. Should I select only most used combinations? What do you think?

like image 812
mtkachenko Avatar asked Jul 24 '16 13:07

mtkachenko


People also ask

How do I create a composite index in SQL Server?

You can create composite indexes using CREATE INDEX or ALTER TABLE. An SQL GUI tool can also be used. When creating one, think of what column is mostly used in the query and make it the first column in the key. Finally, if you need to cover other columns other than for searches and lookups, use included columns.

What is SQL filter index?

A filtered index is an optimized disk-based rowstore nonclustered index especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table.

How do I create a unique index in SQL Server?

Right-click the table on which you want to create a unique index and select Design. On the Table Designer menu, select Indexes/Keys. In the Indexes/Keys dialog box, click Add. Select the new index in the Selected Primary/Unique Key or Index text box.


2 Answers

If you have the index first name + last name + age + state, you don't also need first name + last name + age and first name + last name and first name. If you have the index first name + last name + age + state and a user searches only on "first name" and "last name", the database will be able to use that index. As long as the user specifies columns in the same left-to-right order as your index, the database will be able to use the index even if every column isn't specified.

For instance, if you have the index first name + last name + age + state and the user specifies "first name" and "last name", then the database will be able to use that index to jump to the matching rows. However, if the user specifies "first name" and "age", or "first name" and "state", then the database will only partially use the index to jump to the rows with matching first names, but it will then have to do a scan for the rows that match "age" or "state". If you want to know the technical details behind why this is true, read about database indexes and B+ trees. This is a good explanation.

Databases can also use multiple indexes when running a single query. If you have the indexes

`last name`
`state`
`age`

And the user searches for "last name", "state", and "age", the database will be able to use all three indexes to quickly find the matching rows for each field, and then the results will be combined and rows that don't match all three indexes will not be selected. If you look at an execution plan, you'll be able to see it doing this. Granted this will be a tiny bit slower than having a single index that has every necessary field in it, but it will prevent you from having a ton of indexes.

Also note that even if an index exists, the database may not necessarily use that index because doing a row scan maybe faster. For instance, take the above example with three different indexes, and suppose the user does a search on "last name", "first name", and "state". Because the combination of "last name" and "first name" has a such a high selectivity (meaning most of the values in that index are unique), it might be faster to just use the index to get all the rows that match the first name and last name and then just do a simple iterative scan on those rows to find the ones that also have the matching state, than to use the state index as well, and then join the rows that were returned by both indexes.

When you're designing your indexes, an index won't give you much of a performance boost (and may actually be worse than doing a full table scan) if the selectivity of your index is really low. Gender, for instance, is not a good field to have indexed because you only have two possible values. If the user is searching only on gender, you will never get good performance with or without indexes because you will return half your rows.

Row-for-row, a full table scan is actually faster than using an index. The reason for this is that when the database does a table scan, it is able to jump straight to the data page on disk. When it uses an index, it has to go through a few intermediate index pages before it actually gets to where the data is stored on disk. For a field like "gender" where you're going to be selecting half of your rows, the added overhead of following your index links for half the rows in the table may outweigh the cost of just scanning the entire table without using indexes.

I would recommend indexes

`first name, last name`
`birthdate`
`state`

If you have a specific combination of fields that is searched on frequently, then you can make an index for that too to speed things up. However, don't make an index for every combination of fields.

If you use "birthdate" instead of "birthday", then you don't need "age" because you can calculate that based on "birthdate" and then do a between query on "birthdate". If you're forced to have separate columns for "birthday" and "age", then you could index "age" as well. However, like another user commented below, you'd have to constantly update your ages. I strongly recommend against that design.

One final thing to consider is whether to try to make a covering index. A covering index is one in which every field that the user searched for is part of your index. For example, suppose your table has 100 fields in it, but users are usually only interested in looking up someone's state and age based on their name. So a large percentage of your queries look something like this

SELECT STATE, AGE FROM PEOPLE WHERE FIRSTNAME = 'Homer' AND LASTNAME = 'Simpson'

If your index is LASTNAME, FIRSTNAME, then the database will look up "Homer" and "Simpson" in your index (which will involve reading a few index pages from disk), use the index pointer to go to the disk page where the data record is stored, read that entire data page, parse it into fields, and then return the state and age.

Now, suppose you run the same query but your index is LASTNAME, FIRSTNAME, STATE, AGE. The database engine will still use your index to look up "Homer" and "Simpson", but once it finds the appropriate index record (exactly the same as how it worked above), that index record already has STATE and AGE. Therefore, the database can get the results of your query straight from the index without having to also read the data page from disk.

A situation where a covering index can drastically improve performance is in the case of table scans. Assume you have 100 fields in your table (so the size of a single row is a few hundred bytes or more). Now a user runs the query

SELECT FIRSTNAME, LASTNAME, AGE FROM PEOPLE

The database would have to read the entire table (including all 100 fields which aren't necessary for this query) to get your results. If you had an index LASTNAME, FIRSTNAME, AGE, then the database could get the results by scanning your entire index instead of scanning the entire table. Since in this case a single index element is far smaller byte-wise than a single data row, the query will be much faster.

In your particular case with so few fields in your table, a covering index probably wouldn't be very useful since the fields in the index would be the same as the fields in your table, thus defeating the whole purpose. However, for a table with dozens of fields, of which only a handful are commonly queried, a covering index can be a great way to speed up your queries.

like image 92
Ben Rubin Avatar answered Sep 19 '22 16:09

Ben Rubin


Lots of indexes is a 'bad' idea.
Indexes on individual columns won't help much.
One index that is a 'prefix' of another is redundant.
An index on a flag or column of low 'cardinality' (eg gender), won't be used.

Suggestion: Start with one index per column. Then add on a second column to each index. Pick this second column based on what is likely to be tested together. Avoid having both (a,b) and (b,a)

Then watch what types of queries are generated by 'real' users. Tweak the list of indexes accordingly. This info may lead to a few 3-column indexes.

like image 43
Rick James Avatar answered Sep 19 '22 16:09

Rick James