Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic search- search_analyzer vs index_analyzer

I was looking at http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/ which explains ElasticSearch analyzers.

I did not understand the part about having different search and index analyzers. The second example of custom mapping goes like this:
->the index analyzer is an edgeNgram
->the search analyzer is:

"full_name":{     "filter":[         "standard",         "lowercase",         "asciifolding"     ],     "type":"custom",     "tokenizer":"standard" } 

if we wanted the query "Race" to not return results like *ra*pport and *rac*ial due to edgeNgram, why index it with edgeNgram in the first place?

Please explain with an example where different analyzers are useful.

like image 316
Pavan K Mutt Avatar asked Apr 10 '13 10:04

Pavan K Mutt


People also ask

What is Analyzer in Elasticsearch?

In a nutshell an analyzer is used to tell elasticsearch how the text should be indexed and searched. And what you're looking into is the Analyze API, which is a very nice tool to understand how analyzers work. The text is provided to this API and is not related to the index.

Why is Elasticsearch so slow?

Slow queries are often caused byPoorly written or expensive search queries. Poorly configured Elasticsearch clusters or indices. Saturated CPU, Memory, Disk and network resources on the cluster.

How do I add an analyzer to an existing index Elasticsearch?

To add an analyzer, you must close the index, define the analyzer, and reopen the index. You cannot close the write index of a data stream. To update the analyzer for a data stream's write index and future backing indices, update the analyzer in the index template used by the stream.


2 Answers

You usually have similar analysis chain at both index time and query time. Similar doesn't mean exactly the same, but usually the way you index documents reflects the way you query them.

The ngrams example is a really good fit though, since it's one of the main reasons why you would use different analyzers at index and query time.

For partial matches you index with edge ngrams, so that "elasticsearch" becomes (with mingram 3 and maxgram 20):

"ela", "elas","elast","elasti","elastic","elastics","elasticse","elasticsea","elasticsear","eleasticsearc" and "elasticsearch"

Let's now query the created field. If we query for the term "elastic" there's a match and we get back the expected result. We basically made become what we called above partial match an exact match, given what we indexed. There's no need to apply ngrams to the query too. If we did we would query for all the following terms:

"ela", "elas","elast","elasti" and "elastic"

That would make the query way more complex and would lead to get weird results as well. Let's say you index the term "elapsed" in another document, same field. You would have the following ngrams:

"ela", "elap", "elaps", "elapse", "elapsed"

If you search for "elastic" and make ngrams to the query, the term "ela" would match this second document too, thus you would get it back together with the first document, even though no terms contain the whole "elastic" term you were looking for.

I would suggest you to have a look at the analyze api to play around around with different analyzer and their different results.

like image 97
javanna Avatar answered Oct 02 '22 08:10

javanna


To reference the official documentation about index vs search analyzers:

Occasionally, it makes sense to use a different analyzer at index and search time. For instance, at index time we may want to index synonyms, eg for every occurrence of quick we also index fast, rapid and speedy. But at search time, we don’t need to search for all of these synonyms. Instead we can just look up the single word that the user has entered, be it quick, fast, rapid or speedy.

To enable this distinction, Elasticsearch also supports the index_analyzer and search_analyzer parameters, and analyzers named default_index and default_search.

Taking these extra parameters into account, the full sequence at index time really looks like this:

  • the index_analyzer defined in the field mapping, else
  • the analyzer defined in the field mapping, else
  • the analyzer defined in the _analyzer field of the document, else
  • the default index_analyzer for the type, which defaults to
  • the default analyzer for the type, which defaults to
  • the analyzer named default_index in the index settings, which defaults to
  • the analyzer named default in the index settings, which defaults to
  • the analyzer named default_index at node level, which defaults to
  • the analyzer named default at node level, which defaults to
  • the standard analyzer

And at search time:

  • the analyzer defined in the query itself, else
  • the search_analyzer defined in the field mapping, else
  • the analyzer defined in the field mapping, else
  • the default search_analyzer for the type, which defaults to
  • the default analyzer for the type, which defaults to
  • the analyzer named default_search in the index settings, which defaults to
  • the analyzer named default in the index settings, which defaults to
  • the analyzer named default_search at node level, which defaults to
  • the analyzer named default at node level, which defaults to
  • the standard analyzer
like image 42
Asimov4 Avatar answered Oct 02 '22 07:10

Asimov4