Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full text search - Contains plus wildcard and single quote

I have a table with a name field with this

Test O'neill 123

If I use

SELECT  *
    FROM    table F
    WHERE   CONTAINS ( F.*, '"Test O''neill 123"' )

it works fine but if I use a wildcard * I get no results.

SELECT  *
    FROM    table f
    WHERE   CONTAINS ( F.*, '"Test O''neill 123*"' )

why is this ? I am using a parser for my search terms and this is adding the wildcard *

I checked some sites, about escaping the ' but I haven't found anything referred to this..

Thanks in Advance

like image 712
Gonzalo.- Avatar asked Sep 04 '15 18:09

Gonzalo.-


People also ask

What is full-text search example?

A Full-Text query allows you to search for words inside text data. For example, with a Full-Text query you can search for the word “werewolf” in the title of any of our movies. The basic unit of Full-Text search is often referred to as a token rather than a word.

Which is used for full-text search in SQL?

Full-Text Engine. The Full-Text Engine in SQL Server is fully integrated with the query processor. The Full-Text Engine compiles and executes full-text queries. As part of query execution, the Full-Text Engine might receive input from the thesaurus and stoplist.

What is full-text search in MySQL?

Full-Text Search in MySQL server lets users run full-text queries against character-based data in MySQL tables. You must create a full-text index on the table before you run full-text queries on a table. The full-text index can include one or more character-based columns in the table.

What is full-text search vs LIKE?

FTS involves indexing the individual words within a text field in order to make searching through many records quick. Using LIKE still requires you to do a string search (linear or the like) within the field.


1 Answers

The problem is due to the combination of 1) using the Neutral language 2) plus a stoplist for your full text index 3) plus unexpected behavior when using a wildcard in a search that includes stopwords.

The Neutral language doesn't cover all of the nuances of the English language, so at index-time it considers O'neill to be 2 separate words O and neill. Then your stoplist considers O to be a stopword so this "word" is not added to the index, only neill is.

At search-time, the search engine typically ignores stop words in multi-word phrases. For example, searching for Contains(*, '"we x people"') will match the text ...we the people..., x and the both being stopwords and thus automatically "matching" each other. (I use the term "matching" loosely because the search engine is not matching the stopwords, but rather it knows that people is 1 word away from we.)

So you might expect the wildcard search Contains(*, '"we the people*"') to also find its match, except that it does not when using a stoplist. If it weren't for the stopword the in the search phrase, or if the was not considered a stopword, the search would work fine. I really can't explain this behavior but I suspect it has something to do with the way the word positions are computed. I also suspect it is not the intended behavior.

So back to your case, Contains(*, '"Test O''neill 123"') will find a match but the wildcard search Contains(*, '"Test O''neill 123*"') does not. (You can even simplify the search to Contains(*, '"O''neill*"') and you'll see that it still does not find a match.) The combination of the stopword O with a wildcard runs into the problem I explained in the last paragraph. This is the crux of the problem stated in your question.

Solutions ranging from most-effective to least-effective-but-possibly-more-practical-for-your-case:

1) Change the language on your full text index to English and re-index. This will cause O'neill to be treated as 1 word and thus you'll avoid the weird wildcard behavior that I explained. You can change the language in the full text index properties via SQL Server Management Studio or by dropping and recreating the index as follows:

ALTER FULLTEXT INDEX ON MyTable DROP (Column1) 
GO
ALTER FULLTEXT INDEX ON MyTable ADD (Column1 LANGUAGE [English])
-- repeat for each column in the index

2) If you need to keep using the Neutral language, consider removing O from your stoplist and re-index.

ALTER FULLTEXT STOPLIST MyStoplist DROP 'o' LANGUAGE 'Neutral';

3) Or don't use a stoplist if you don't need one.

ALTER FULLTEXT INDEX ON MyTable SET STOPLIST = OFF

4) If none of the above solutions are practical, consider removing stopwords from the search phrase, or at least the O' prefix in surnames.

like image 127
Keith Avatar answered Dec 06 '22 21:12

Keith