Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full-text search does not find some prefixes

I'm using SQL Server 2014's full-text search capabilites to find documents in a database that start with a given prefix. Some queries, however, do not yield any results, while they should.

Take the following example:

SELECT * FROM [Profile].[DocumentView] WHERE CONTAINS(Content, '"Friedenseins*"')

(24 row(s) affected)

SELECT * FROM [Profile].[DocumentView] WHERE CONTAINS(Content, '"Friedensein*"')

(0 row(s) affected)

SELECT * FROM [Profile].[DocumentView] WHERE CONTAINS(Content, '"Friedensei*"')

(29 row(s) affected)

I understand the first and third result, but not the second one. The stoplist for the full-text index is switched off. The language for the wordbreaker is set to German.

EDIT:
The suggestion to use FREETEXT instead is not a solution for this particular case, as I need CONTAINS's proximity search feature.

like image 655
Mathias Becher Avatar asked Nov 18 '14 15:11

Mathias Becher


2 Answers

Thank you for posting this question Mathias. I just wanted to point out the suggestion from Microsoft (via https://connect.microsoft.com/SQLServer/feedback/details/1032815) was to switch to the format

 SELECT * FROM [Profile].[DocumentView] WHERE FREETEXT(Content, '"Friedensein*"')

While this doesn't help Mathias, it was able to help me out and might help the reader.

In my case, I had a problem with searching a date string stored in a text field (nvarchar). My simple example was that this DIDN'T WORK:

select * from Sample
where CONTAINS(*, '"1/5/2015*"')

... returns nothing. However, when I switch to FREETEXT as follows, it does return as expected.

select * from Sample
where FREETEXT(*, '"1/5/2015*"')

I am also using SQL 2014. Please vote up the MS Connect link if you are also having this problem.

UPDATE 1/8/2015:

I have since discovered that this solution does not work in my case since FREETEXT brings back other date values that are "similar" or so it seems. With my example of 1/5/2015 above, I'm also getting back hits on 1/2/2015, 1/6/2015 and so on. In fact, if I query against 12/5/2014 (former year) I get hits all across the year, so the "similar" seems to be defined by the year in this case.

For reference, here is the SQL documentation: http://msdn.microsoft.com/en-us/library/ms176078.aspx

like image 197
ebol2000 Avatar answered Oct 20 '22 00:10

ebol2000


The problem is with the word-breakers, which do not include the phrase id, so the second query gets translated into the equivalent of (pseudo-SQL)

SELECT * FROM [Profile].[DocumentView] WHERE Content LIKE
    ((friedensein* OR frieden*) in Position 1) AND (sein* in Position 2)

But "Friedenseinsätze" has "einsätze" in Position 2. With the proper phrase id, the query would become

SELECT * FROM [Profile].[DocumentView] WHERE Content LIKE
    ((friedensein* in Position 1) OR ((frieden* in Position 1) AND (sein* in Position 2)))

The Office team has been informed. The Connect Thread will be updated when new information is available.

UPDATE: The issue was resolved as Won't Fix.

like image 1
Mathias Becher Avatar answered Oct 19 '22 23:10

Mathias Becher