Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search for words in SQL Server index

I need something in between a full text search and an index search:
I want to search for text in one column of my table (probably there will be an index on the column, too, if that matters).

Problem is, I want to search for words in the column, but I don't want to match parts.

For example, my column might contain business names:
Mighty Muck Miller and Partners Inc.
Boy & Butter Breakfast company

Now if I search for "Miller" I want to find the first line. But if I search for "iller" I don't want to find it, because there is no word starting with "iller". Searching for "Break" should find "Boy & Butter Breakfast company", though, since one word is starting with "Break".

So if I try and use

WHERE BusinessName LIKE %Break%

it will find too many hits.

Is there any way to Search for Words separated by whitespace or other delimiters?

(LINQ would be best, plain SQL would do, too)

Important: Spaces are by far not the only delimiters! Slashes, colons, dots, all non-alphanumerical characters should be considered for this to work!

like image 603
Sam Avatar asked Oct 01 '08 10:10

Sam


1 Answers

Your word delimiters are going to be many: space, tab, beginning of line, parentheses, periods, commas, exclamation/question marks etc. So, a pretty simple solution is to use a regex in your WHERE clause. (And it's going to be a lot more efficient than just ORing every possible delimiter you can think of.)

Since you mentioned LINQ, here's an article that describes how to do efficient regex querying with SQL Server.

Complicated WHERE clauses like this always raise a red flag with me as far as performance is concerned, so I definitely suggest benchmarking whatever you end up with, you may decide to build a search index for the column after all.

EDIT: Saw you edited your question. When writing your regex, it's easy to just have it use any non-alphanum character as a delimiter, i.e. [^0-9a-zA-Z], or \W for any non-word character, \b for any word boundary and \B for any non-word boundary. Or, instead of matching delimiters, just match any word, i.e. \w+. Here's another example of someone doing regex searches with SQL Server (more complicated than what you'd need).

like image 146
joelhardi Avatar answered Sep 28 '22 09:09

joelhardi