Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply synonyms at query time instead of index time in Elasticsearch

According to the elasticsearch reference documentation, it is possible to:

Expansion can be applied either at index time or at query time. Each has advantages (⬆)︎ and disadvantages (⬇)︎. When to use which comes down to performance versus flexibility.

The advantages and disadvantages all make sense and for my specific use I want to make use of synonyms at query time. My use case is that I want to allow admin users in my system to curate these synonyms without having to reindex everything on an update. Also, I'd like to do it without closing and reopening the index.

The main reason I believe this is possible is this advantage:

(⬆)︎ Synonym rules can be updated without reindexing documents.

However, I can't find any documentation describing how to apply synonyms at query time instead of index time.

To use a concrete example, if I do the following (example stolen and slightly modified from the reference), it seems like this would apply the synonyms at index time:

/* NOTE: This was all run against elasticsearch 1.5 (if that matters; documentation is identical in 2.x) */

// Create our synonyms filter and analyzer on the index
PUT my_synonyms_test
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "queen,monarch"
          ]
        }
      },
      "analyzer": {
        "my_synonyms": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_synonym_filter"
          ]
        }
      }
    }
  }
}

// Create a mapping that uses this analyzer
PUT my_synonyms_test/rulers/_mapping
{
  "properties": {
    "name": {
      "type": "string"
    },
    "title": {
      "type": "string",
      "analyzer": "my_synonyms"
    }
  }
}

// Some data
PUT my_synonyms_test/rulers/1
{
  "name": "Elizabeth II",
  "title": "Queen"
}

// A query which utilises the synonyms
GET my_synonyms_test/rulers/_search
{
  "query": {
    "match": {
      "title": "monarch"
    }
  }
}

// And we get our expected result back:
{
   "took": 42,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1.4142135,
      "hits": [
         {
            "_index": "my_synonyms_test",
            "_type": "rulers",
            "_id": "1",
            "_score": 1.4142135,
            "_source": {
               "name": "Elizabeth II",
               "title": "Queen"
            }
         }
      ]
   }
}

So my question is: how could I amend the above example so that I would be using the synonyms at query time?

Or am I barking up completely the wrong tree and can you point me somewhere else please? I've looked at plugins mentioned in answers to similar questions like https://stackoverflow.com/a/34210587/2240218 and https://stackoverflow.com/a/18481495/2240218 but they all seem to be a couple of years old and unmaintained, so I'd prefer to avoid these.

like image 278
seddy Avatar asked Feb 06 '17 15:02

seddy


2 Answers

Simply use search_analyzer instead of analyzer in your mapping and your synonym analyzer will only be used at search time

PUT my_synonyms_test/rulers/_mapping
{
  "properties": {
    "name": {
      "type": "string"
    },
    "title": {
      "type": "string",
      "search_analyzer": "my_synonyms"       <--- change this
    }
  }
}
like image 182
Val Avatar answered Oct 26 '22 03:10

Val


To use the custom synonym filter at QUERY TIME instead of INDEX TIME, you first need to remove the analyzer from your mapping:

PUT my_synonyms_test/rulers/_mapping
{
  "properties": {
    "name": {
      "type": "string"
    },
    "title": {
      "type": "string"
    }
  }
}

You can then use the analyzer that makes use of the custom synonym filter as part of a query_string query:

GET my_synonyms_test/rulers/_search
{
  "query": {
      "query_string": {
         "default_field": "title",
         "query": "monarch",
         "analyzer": "my_synonyms"
      }
  }
}

I believe the query_string query is the only one that allows for specifying an analyzer since it uses a query parser to parse its content.

As you said, when using the analyzer only at query time, you won't need to re-index on every change to your synonyms collection.

like image 29
Andrea Singh Avatar answered Oct 26 '22 04:10

Andrea Singh