Using SQL Server 2012 (in general using SQL Server 2008 R2 up to SQL Server 2016)
This question is a more specific re wording of SQL-Server Full Text Index Unexpected results. Please see here for how we have got to this point and what has been tried already.
I am re-posting now we have found the specific fault. Many thanks to @HoneyBadger.
His help has been invaluable getting to this point.
Table structure:
CREATE TABLE TestFullTextSearch (Id INT NOT NULL, AllText NVARCHAR(400))
CREATE UNIQUE INDEX test_tfts ON TestFullTextSearch(Id)
CREATE FULLTEXT CATALOG ftcat_tfts
CREATE FULLTEXT INDEX ON TestFullTextSearch(AllText)
KEY INDEX test_tfts ON ftcat_tfts
WITH CHANGE_TRACKING AUTO, STOPLIST OFF
Data:
INSERT INTO TestFullTextSearch
VALUES (1, ' 123_456 789 '), (2, ' 789 123_456 '),
(3, ' 123_456 ABC '), (4, ' ABC 123_456 ')
Please note this data is purely to demonstrate the issue, and is not indicative of a live data set. Our live data sets can be over 500,000 rows, with paragraphs of data in a single field being searched - hence using full text searches.
Select 1: Results as expected
SELECT *
FROM TestFullTextSearch
WHERE CONTAINS (AllText, '"123*"')
Id AllText
----------- ------------
1 123_456 789
2 789 123_456
3 123_456 ABC
4 ABC 123_456
SELECT 2: Misses row 2 in the result set
SELECT *
FROM TestFullTextSearch
WHERE CONTAINS (AllText, '"123_*"')
Id AllText
----------- ------------
1 123_456 789
3 123_456 ABC
4 ABC 123_456
SELECT 3: Only returns row 2
SELECT *
FROM TestFullTextSearch
WHERE CONTAINS (AllText, '"123\_*"')
Id AllText
----------- ------------
2 789 123_456
Conclusion: Searching for a number string suffixed with an underscore fails if preceding word is a number string.
Problem: Our clients use full text search and expect a result to be forth coming around part numbers and catalogue references, which may or may not be in a section of text including other number strings. Full text search does not seem to support this in a consistent manner.
Any help gratefully received.
NB: This issue does not happen on SQL SERVER 2008, but does on 2012+
I have also tried switching to the older version of FTS parser. Testing with
SELECT * FROM sys.dm_fts_parser (' "789 123_456" ',1033,0,0)
SELECT * FROM sys.dm_fts_parser (' "789 123_456" ',2057,0,0)
I had with current parser:
And after reverting to the legacy parser:
So it has had an effect, however I'm still getting the same results.
Are there any other differences in the full text search between 2008 & 2012 that could be having this effect?
The SQL Server process uses the following components for full-text search: User tables. These tables contain the data to be full-text indexed. Full-text gatherer. The full-text gatherer works with the full-text crawl threads. Thesaurus files. These files contain synonyms of search terms. Stoplist objects.
A full-text index stores information about significant words and their location within one or more columns of a database table. A full-text index is a special type of token-based functional index that is built and maintained by the Full-Text Engine for SQL Server.
The full text portion of a query is performed by the full text engine. When a full text crawl is initiated, the full text engine (one of the SQL Server Process Components) pushes large batches of data into memory and invokes the filter daemon host for further processing. The filter daemon host is a process that is started by the Full-Text Engine.
In contrast to full-text search, the LIKE Transact-SQL predicate works on character patterns only. Also, you cannot use the LIKE predicate to query formatted binary data. Furthermore, a LIKE query against a large amount of unstructured text data is much slower than an equivalent full-text query against the same data.
They changed the full text parsers/stemmers between SQL 2008 and SQL 2012.
With a registry change, you can use the legacy parser, which should work better in your situation.
See https://technet.microsoft.com/en-us/library/gg509108(v=sql.110).aspx for details.
If you need to support both old and new style, then you can revert US English to the old and keep UK English the new (or vice versa)
Using SQL 2016, I reverted UK English and kept US English the same:
exec sp_help_fulltext_system_components 'wordbreaker', 1033
exec sp_help_fulltext_system_components 'wordbreaker', 2057
Returns:
I created another table using UK English and populated it.
CREATE TABLE TestFullTextSearch2 (Id INT NOT NULL, AllText NVARCHAR(400))
CREATE UNIQUE INDEX test_tfts2 ON TestFullTextSearch2(Id)
CREATE FULLTEXT INDEX ON TestFullTextSearch2(AllText language 2057)
KEY INDEX test_tfts2 ON ftcat_tfts
WITH CHANGE_TRACKING AUTO, STOPLIST OFF
INSERT INTO TestFullTextSearch2
VALUES (1, ' 123_456 789 '), (2, ' 789 123_456 '),
(3, ' 123_456 ABC '), (4, ' ABC 123_456 ')
I'm getting the expected 4 results for all 3 queries.
Verify that your changes have taken effect.
exec sp_help_fulltext_system_components 'wordbreaker', 1033
exec sp_help_fulltext_system_components 'wordbreaker', 2057
select t.name, c.* from sys.tables t inner join sys.fulltext_index_columns c on t.object_id = c.object_id
The problem here is basically a difference in how MSSQL 2012 stores the index and how the query itself is handling the underscore _.
This becomes clear when inspecting the index keywords & fts parser. For row 2 the keyword 123_456 is not stored as such, because of the numeric value coming in front of it. However the fts parser will search for an exact match on "123_" and does not remove the underscore.
select * from sys.dm_fts_index_keywords_by_document
(
DB_ID('TestDatabase'),
OBJECT_ID('TestFullTextSearch')
) order by document_id
select * from sys.dm_fts_parser('"123_*"', 0, 0, 0)
One solution would be to change the word breaker for a specific language. You could easily replace it with the word breaker dll from MSSQL 2008 or 2016 where this problem does not occur. (e.g. take the one for Neutral Language: NaturalLanguage6.dll). Make sure to create the Full Text index for the same language.
To find the registered word breakers and where dll's are located use this query:
EXEC sp_help_fulltext_system_components 'wordbreaker';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With