I have a user table with a single column for a person's name:
CREATE TABLE [dbo].[Users]
(
Id bigint NOT NULL,
Name nvarchar(80) NOT NULL,
PRIMARY KEY CLUSTERED (Id ASC)
)
The Name
column can contain either a full name or just the first name or anything really (separated by spaces). In order to implement a search on Name
, I would like to utilize SQL's full-text search, but not sure if it's suitable for searching names/nicknames and not actual words. Also the question is - which language do I choose when creating the FT index on Name
?
Any other considerations?
Thank you.
Full-text search is meant to search large amounts of text. For example, a search engine will use a full-text search to look for keywords in all the web pages that it indexed. The key to this technique is indexing.
A Full-Text query allows you to search for words inside text data. For example, with a Full-Text query you can search for the word “werewolf” in the title of any of our movies. The basic unit of Full-Text search is often referred to as a token rather than a word.
Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.
Like uses wildcards only, and isn't all that powerful. Full text allows much more complex searching, including And, Or, Not, even similar sounding results (SOUNDEX) and many more items.
At first sight, I would recommend using the LIKE operator rather than a full-text query.
Make sure you search case insensitive and probably accent insensitive. This can be achieved by the setting the correct collation on either the server, database, the table column or in your query. In the query, this is done by something like:
SELECT *
FROM [dbo].[Users]
WHERE Name LIKE '%niaher%' COLLATE SQL_Latin1_General_CP1_CI_AI
If you use a full-text index, you get all sorts of features, like verb stemming and thesaurus, see Linguistic Components and Language Support in Full-Text Search, which you do not need, when searching in a list of names. By the way, these features are language dependent and that's why you specify a language on the full-text index.
Application of a stoplist you might even want to avoid. At least I would, since in Dutch many surnames start with articles and/or prepositions: "Rembrandt van Rijn". "van" would be in a Dutch stoplist for sure and prevent any match on a search term that contains "van".
If you run into performance issues, it might be useful to try a full-text index and search using CONTAINS with a simple term.
SELECT *
FROM [dbo].[Users]
WHERE CONTAINS(Name, 'niaher')
Please note that full-text indexes are updated asynchronously.
It seems that if you want to search multi-part names, full-text search is the easiest and most appropriate approach (please correct me if I'm wrong). The other alternative being LIKE '%query%'
, however it has too many disadvantages:
So I went ahead and implemented a full-text search. My queries look something like this:
SELECT * FROM Users WHERE CONTAINS(Name, '"John*"')
The only slight difficulty is that I had to convert user query (John) into a CONTAINS-friendly query ("John*"). To do that, I implemented this method in my UserRepository:
/// <summary>
/// Converts user-entered search query into a query that can be consumed by CONTAINS keyword of SQL Server.
/// </summary>
/// <example>If query is "John S Ju", the result will be "\"John*\" AND \"S*\" AND \"Ju*\"".</example>
/// <param name="query">Query entered by user.</param>
/// <returns>String instance.</returns>
public static string GetContainsQuery(string query)
{
string containsQuery = string.Empty;
var terms = query.Split(new[] { ' ' }, StringSplitOptions.None);
if (terms.Length > 1)
{
for (int i = 0; i < terms.Length; i++)
{
string term = terms[i].Trim();
// Add wildcard term, e.g. - "term*". The reason to add wildcard is because we want
// to allow search by partially entered name parts (partially entered first name and/or
// partially entered last name, etc).
containsQuery += "\"" + term + "*\"";
// If it's not the last term.
if (i < terms.Length - 1)
{
// We want all terms inside user query to match.
containsQuery += " AND ";
}
}
containsQuery = containsQuery.Trim();
}
else
{
containsQuery = "\"" + query + "*\"";
}
return containsQuery;
}
Hope this helps anyone stumbling into the same issue.
PS - I wrote a blogpost documenting this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With