I was looking at http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/ which explains ElasticSearch analyzers. I did not understand the part about having different search and index analyzers. The second example of custom mapping goes like this: ->the index analyzer is an edgeNgram ->the search analyzer is: <pre class="prettyprint"><code>"full_name":{ "filter":[ "standard", "lowercase", "asciifolding" ], "type":"custom", "tokenizer":"standard" } </code></pre> if we wanted the query "Race" to not return results like *ra*pport and *rac*ial due to edgeNgram, why index it with edgeNgram in the first place? Please explain with an example where different analyzers are useful.

To reference the official documentation about index vs search analyzers: <blockquote> Occasionally, it makes sense to use a different analyzer at index and search time. For instance, at index time we may want to index synonyms, eg for every occurrence of quick we also index fast, rapid and speedy. But at search time, we don’t need to search for all of these synonyms. Instead we can just look up the single word that the user has entered, be it quick, fast, rapid or speedy. To enable this distinction, Elasticsearch also supports the index_analyzer and search_analyzer parameters, and analyzers named default_index and default_search. Taking these extra parameters into account, the full sequence at index time really looks like this: <ul> <li>the index_analyzer defined in the field mapping, else</li> <li>the analyzer defined in the field mapping, else</li> <li>the analyzer defined in the _analyzer field of the document, else</li> <li>the default index_analyzer for the type, which defaults to</li> <li>the default analyzer for the type, which defaults to</li> <li>the analyzer named default_index in the index settings, which defaults to</li> <li>the analyzer named default in the index settings, which defaults to</li> <li>the analyzer named default_index at node level, which defaults to</li> <li>the analyzer named default at node level, which defaults to</li> <li>the standard analyzer</li> </ul> And at search time: <ul> <li>the analyzer defined in the query itself, else</li> <li>the search_analyzer defined in the field mapping, else</li> <li>the analyzer defined in the field mapping, else</li> <li>the default search_analyzer for the type, which defaults to</li> <li>the default analyzer for the type, which defaults to</li> <li>the analyzer named default_search in the index settings, which defaults to</li> <li>the analyzer named default in the index settings, which defaults to</li> <li>the analyzer named default_search at node level, which defaults to</li> <li>the analyzer named default at node level, which defaults to</li> <li>the standard analyzer</li> </ul> </blockquote>

Elastic search- search_analyzer vs index_analyzer

Tags:

search

elasticsearch

analyzer

I was looking at http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/ which explains ElasticSearch analyzers.

I did not understand the part about having different search and index analyzers. The second example of custom mapping goes like this:
->the index analyzer is an edgeNgram
->the search analyzer is:

"full_name":{     "filter":[         "standard",         "lowercase",         "asciifolding"     ],     "type":"custom",     "tokenizer":"standard" }

if we wanted the query "Race" to not return results like *ra*pport and *rac*ial due to edgeNgram, why index it with edgeNgram in the first place?

Please explain with an example where different analyzers are useful.

316

asked Apr 10 '13 10:04

Pavan K Mutt

2 Answers

You usually have similar analysis chain at both index time and query time. Similar doesn't mean exactly the same, but usually the way you index documents reflects the way you query them.

The ngrams example is a really good fit though, since it's one of the main reasons why you would use different analyzers at index and query time.

For partial matches you index with edge ngrams, so that "elasticsearch" becomes (with mingram 3 and maxgram 20):

"ela", "elas","elast","elasti","elastic","elastics","elasticse","elasticsea","elasticsear","eleasticsearc" and "elasticsearch"

Let's now query the created field. If we query for the term "elastic" there's a match and we get back the expected result. We basically made become what we called above partial match an exact match, given what we indexed. There's no need to apply ngrams to the query too. If we did we would query for all the following terms:

"ela", "elas","elast","elasti" and "elastic"

That would make the query way more complex and would lead to get weird results as well. Let's say you index the term "elapsed" in another document, same field. You would have the following ngrams:

"ela", "elap", "elaps", "elapse", "elapsed"

If you search for "elastic" and make ngrams to the query, the term "ela" would match this second document too, thus you would get it back together with the first document, even though no terms contain the whole "elastic" term you were looking for.

I would suggest you to have a look at the analyze api to play around around with different analyzer and their different results.

answered Oct 02 '22 08:10

javanna

To reference the official documentation about index vs search analyzers:

Occasionally, it makes sense to use a different analyzer at index and search time. For instance, at index time we may want to index synonyms, eg for every occurrence of quick we also index fast, rapid and speedy. But at search time, we don’t need to search for all of these synonyms. Instead we can just look up the single word that the user has entered, be it quick, fast, rapid or speedy.

To enable this distinction, Elasticsearch also supports the index_analyzer and search_analyzer parameters, and analyzers named default_index and default_search.

Taking these extra parameters into account, the full sequence at index time really looks like this:

the index_analyzer defined in the field mapping, else

the analyzer defined in the field mapping, else

the analyzer defined in the _analyzer field of the document, else

the default index_analyzer for the type, which defaults to

the default analyzer for the type, which defaults to

the analyzer named default_index in the index settings, which defaults to

the analyzer named default in the index settings, which defaults to

the analyzer named default_index at node level, which defaults to

the analyzer named default at node level, which defaults to

the standard analyzer

And at search time:

the analyzer defined in the query itself, else

the search_analyzer defined in the field mapping, else

the analyzer defined in the field mapping, else

the default search_analyzer for the type, which defaults to

the default analyzer for the type, which defaults to

the analyzer named default_search in the index settings, which defaults to

the analyzer named default in the index settings, which defaults to

the analyzer named default_search at node level, which defaults to

the analyzer named default at node level, which defaults to

the standard analyzer

answered Oct 02 '22 07:10

Asimov4

Related questions
                            
                                What is best and most active open source .Net search technology?
                            
                                VARCHAR as foreign key/primary key in database good or bad?
                            
                                Database of common name aliases / nicknames of people
                            
                                R list files with multiple conditions
                            
                                Regex for PascalCased words (aka camelCased with leading uppercase letter)
                            
                                Android - Implementing search filter to a RecyclerView
                            
                                submitting a form when a checkbox is checked
                            
                                How do I search a Perl array for a matching string?
                            
                                Searching for a string in a large text file - profiling various methods in python
                            
                                Optimised search using Ajax and keypress
                            
                                Find indexOf a byte array within another byte array
                            
                                php glob - scan in subfolders for a file
                            
                                Lucene Score results
                            
                                How to display list of repositories from subversion server
                            
                                How can I search and replace recursively in a directory in Vim?
                            
                                Search in VS Code for multiple terms
                            
                                Android: Return search query to current activity
                            
                                How can I manipulate MySQL fulltext search relevance to make one field more 'valuable' than another?
                            
                                JIRA: Searching for all issues with a given link type
                            
                                Fast Algorithm to Quickly Find the Range a Number Belongs to in a Set of Ranges?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With