Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server Full Text Search ampersand (&)

have problem with ampersand (&)

How to search for the words (or sentences) that contain an ampersand (&).

For example, in the database are:

1: "Johnson & Johnson"
2: "AT&T"
3: "Sample & Sample"

How should I write a full text search query to search for individual records?

SELECT * from Companies c WHERE CONTAINS(c.CompanyName, '"AT&T"')

I know that character (&) is responsible for the logical AND operation. But I do not know how to encode it to search in text with use full text search.

Any idea?

like image 414
czarczarek Avatar asked Oct 10 '11 15:10

czarczarek


People also ask

Can you use ampersand in SQL?

Use an ampersand (&) to identify each variable in your SQL statement. You do not need to define the value of each variable. Toggling the display of the text of a command before and after SQL*Plus replaces substitution variabfes with values.

Does SQL Server support full text search?

Full-Text Search in SQL Server and Azure SQL Database lets users and applications run full-text queries against character-based data in SQL Server tables.

How do you escape an ampersand in SQL?

If you set escape on, it uses an esape using the backslash. So, backslash ampersand will show you an ampersand. Hope this helps.

How do I find full text search in SQL Server?

How can I tell if Full-Text Search is enabled on my SQL Server instance? A: You can determine if Full-Text Search is installed by querying the FULLTEXTSERVICEPROPERTY like you can see in the following query. If the query returns 1 then Full-Text Search is enabled.


2 Answers

Short version: You can't (or at least you can, but you may get more results than you expected)

Long version: The character '&' is treated as a "word breaker", i.e. when SQL Server encounters an '&' it treats it as the start of a new "word" (i.e. token). What SQL Server Sees when parsing "AT&T" is two tokens, "AT" and "T".

You can check this for yourself using sys.dm_fts_parser:

SELECT * FROM sys.dm_fts_parser('AT&T', 1033, 0, 0)

keyword     group_id    phrase_id   occurrence  special_term  display_term  expansion_type source_term
----------- ----------- ----------- ----------- ------------- ------------- -------------- -----------
0x00610074  1           0           1           Noise Word    at            0              AT
0x0074      2           0           1           Noise Word    t             0              T

This means that search for "AT&T" is pretty much exactly the same as just searching for "AT T".

This is by design, as far as I can see the only way to modify this behaviour would be to install your own word breaker, however this isn't something that I would recommend doing.

like image 82
Justin Avatar answered Oct 12 '22 18:10

Justin


The accepted answer isn't entirely correct. Enclosing the search term in double-quotes makes the grouping of words a "phrase" match. In this case, the ampsersand ( & ) can be treated as a literal character, such as when surrounded by one or more letters that do not form a known word. Just looking at your "AT&T" example, we see:

DECLARE @Term NVARCHAR(100);
SET @Term = N'"AT&T"';

SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 1);
GO

Returns:

keyword             group  phrase  occurrence  special      display  expansion  source
                    id     id                  term         term     type       term
0x0061007400260074  1      0       1           Exact Match  at&t     0          AT&T

As you can see, the ampersand presents no problem at all, as long as it is enclosed in double-quotes ( " ) which you are already doing, woo hoo!

However, that doesn't work as cleanly for the "Johnson & Johnson" example:

DECLARE @Term NVARCHAR(100);
SET @Term = N'"Johnson & Johnson"';

SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 1);
GO

Returns:

keyword                         group  phrase  occurrence  special      display  expansion  source
                                id     id                  term         term     type       term
0x006A006F0068006E0073006F006E  1      0       1           Exact Match  johnson  0          Johnson & Johnson
0x006A006F0068006E0073006F006E  1      0       2           Exact Match  johnson  0          Johnson & Johnson

That would seem to also match a search term of Johnson Johnson, which isn't technically correct.

So, in addition to enclosing in double-quotes, you can also convert the ampersand to be an underscore ( _ ) which is handled differently:

DECLARE @Term NVARCHAR(100);
SET @Term = N'"Johnson _ Johnson"';

SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 1);
GO

Returns:

keyword                         group  phrase  occurrence  special      display  expansion  source
                                id     id                  term         term     type       term
0x006A006F0068006E0073006F006E  1      0       1           Exact Match  johnson  0          Johnson _ Johnson
0x005F                          1      0       2           Exact Match  _        0          Johnson _ Johnson
0x006A006F0068006E0073006F006E  1      0       3           Exact Match  johnson  0          Johnson _ Johnson

AND, doing that one character translation does not seem to adversely affect the original "AT&T" search:

DECLARE @Term NVARCHAR(100);
SET @Term = N'"AT_T"';

SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, 0, 1);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 0);
SELECT * FROM sys.dm_fts_parser(@Term, 1033, NULL, 1);

Returns:

keyword             group  phrase  occurrence  special      display  expansion  source
                    id     id                  term         term     type       term
0x00610074005F0074  1      0       1           Exact Match  at_t     0          AT_T
like image 37
Solomon Rutzky Avatar answered Oct 12 '22 17:10

Solomon Rutzky