Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimising LIKE expressions that start with wildcards

I have a table in a SQL Server database with an address field (ex. 1 Farnham Road, Guildford, Surrey, GU2XFF) which I want to search with a wildcard before and after the search string.

SELECT *
FROM Table
WHERE Address_Field LIKE '%nham%'

I have around 2 million records in this table and I'm finding that queries take anywhere from 5-10s, which isn't ideal. I believe this is because of the preceding wildcard.

I think I'm right in saying that any indexes won't be used for seek operations because of the preceeding wildcard.

Using full text searching and CONTAINS isn't possible because I want to search for the latter parts of words (I know that you could replace the search string for Guil* in the below query and this would return results). Certainly running the following returns no results

SELECT *
FROM Table
WHERE CONTAINS(Address_Field, '"nham"')

Is there any way to optimise queries with preceding wildcards?

like image 448
hwilson1 Avatar asked Jan 26 '17 17:01

hwilson1


People also ask

Which wildcards are used with LIKE operator?

SQL WildcardsWildcard characters are used with the LIKE operator. The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

Does index work on like?

Indexes cannot be used with LIKE '%text%' predicates. They can, however with LIKE 'text%'.

Does Like operator use index?

When you do LIKE '%text%' it can't use the index because there can be a variable number of characters before text. What you should be doing is not use a query like that at all. Instead you should use something like FTS (Full Text Search) that MySQL supports for MyISAM tables.

What can I use instead of like in SQL?

You may come across the situation, where you need alternate to “like” keyword of SQL, that is to search for sub-string in columns of the table. The one way to achieve it to use instr() function, instr() function takes 3 parameters in account .


2 Answers

Here is one (not really recommended) solution.

Create a table AddressSubstrings. This table would have multiple rows per address and the primary key of table.

When you insert an address into table, insert substrings starting from each position. So, if you want to insert 'abcd', then you would insert:

  • abcd
  • bcd
  • cd
  • d

along with the unique id of the row in Table. (This can all be done using a trigger.)

Create an index on AddressSubstrings(AddressSubstring).

Then you can phrase your query as:

SELECT *
FROM Table t JOIN
     AddressSubstrings ads
     ON t.table_id = ads.table_id
WHERE ads.AddressSubstring LIKE 'nham%';

Now there will be a matching row starting with nham. So, like should make use of an index (and a full text index also works).

If you are interesting in the right way to handle this problem, a reasonable place to start is the Postgres documentation. This uses a method similar to the above, but using n-grams. The only problem with n-grams for your particular problem is that they require re-writing the comparison as well as changing the storing.

like image 93
Gordon Linoff Avatar answered Sep 30 '22 19:09

Gordon Linoff


I can't offer a complete solution to this difficult problem.

But if you're looking to create a suffix search capability, in which, for example, you'd be able to find the row containing HWilson with ilson and the row containing ABC123000654 with 654, here's a suggestion.

  WHERE REVERSE(textcolumn) LIKE REVERSE('ilson') + '%'

Of course this isn't sargable the way I wrote it here. But many modern DBMSs, including recent versions of SQL server, allow the definition, and indexing, of computed or virtual columns.

I've deployed this technique, to the delight of end users, in a health-care system with lots of record IDs like ABC123000654.

like image 25
O. Jones Avatar answered Sep 30 '22 18:09

O. Jones