Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Search and Dashes

I am using Azure Search and trying to perform a search against documents:

It seems as though doing this: /indexes/blah/docs?api-version=2015-02-28&search=abc\-1003

returns the same results as this: /indexes/blah/docs?api-version=2015-02-28&search=abc-1003

Shouldn't the first one return different results than the second due to the escaping backwards slash? From what I understand the backwards slash should allow for an exact search on the whole string of "abc-1003" instead of doing a "not" operator.

(more info here: https://msdn.microsoft.com/en-us/library/azure/dn798920.aspx)

The only way I can get it to work is by doing this (note the double quotes): /indexes/blah/docs?api-version=2015-02-28&search="abc-1003"

I would rather not do that because that would mean making the user enter in the quotes, which they will not know how to do.

Am I expecting something I shouldn't or is it possibly a bug with Azure Search?

like image 847
caj Avatar asked Jun 02 '16 20:06

caj


2 Answers

First, a dash not prefaced by a whitespace acts like a dash, not a negation operator.

As per the MSDN docs for simple query syntax

- Only needs to be escaped if it's the first character after whitespace, not if it's in the middle of a term. For example, "wi-fi" is a single term

Second, unless you are using a custom analyzer for your index, the dash will be treated by the analyzer almost like white-space and will break abc-1003 into two tokens, abc and 1003.

Then when you put it in quotes"abc-1003" it will be treated as a search for the phrase abc 1003, thus returning what you expect.

If you want to exact match on abc-1003 consider using a filter instead. It is faster and can matching things like guids or text with dashes

like image 65
Sean Saleh Avatar answered Nov 04 '22 10:11

Sean Saleh


The documentation says that a hyphen "-" is treated as a special character that must be escaped.
In reality a hyphen is treated as a split of the token and words on both sides are searched, as Sean Saleh pointed out.

After a small investigation, I found that you do not need a custom analyzer, built-in whitespace would do.
Here is how you can use it:

{
    "name": "example-index-name",
    "fields": [
        {
            "name": "name",
            "type": "Edm.String",  
            "analyzer": "whitespace",
            ...
        },
    ],
...
}

You use this endpoint to update your index:

https://{service-name}.search.windows.net/indexes/{index-name}?api-version=2017-11-11&allowIndexDowntime=true

Do not forget to include api-key to the request header.

You can also test this and other analyzers through the analyzer test endpoint:

{
  "text": "Text to analyze",
  "analyzer": "whitespace"
}
like image 35
Farrukh Normuradov Avatar answered Nov 04 '22 11:11

Farrukh Normuradov