Analyzers in elasticsearch

Tags:

I'm having trouble understanding the concept of analyzers in elasticsearch with tire gem. I'm actually a newbie to these search concepts. Can someone here help me with some reference article or explain what actually the analyzers do and why they are used?

I see different analyzers being mentioned at elasticsearch like keyword, standard, simple, snowball. Without the knowledge of analyzers I couldn't make out what actually fits my need.

740

asked Oct 11 '12 09:10

Vamsi Krishna

2 Answers

Let me give you a short answer.

An analyzer is used at index Time and at search Time. It's used to create an index of terms.

To index a phrase, it could be useful to break it in words. Here comes the analyzer.

It applies tokenizers and token filters. A tokenizer could be a Whitespace tokenizer. It split a phrase in tokens at each space. A lowercase tokenizer will split a phrase at each non-letter and lowercase all letters.

A token filter is used to filter or convert some tokens. For example, a ASCII folding filter will convert characters like ê, é, è to e.

An analyzer is a mix of all of that.

You should read Analysis guide and look at the right all different options you have.

By default, Elasticsearch applies the standard analyzer. It will remove all common english words (and many other filters)

You can also use the Analyze Api to understand how it works. Very useful.

127

answered Sep 21 '22 07:09

dadoonet

In Lucene, analyzer is a combination of tokenizer (splitter) + stemmer + stopword filter

In ElasticSearch, analyzer is a combination of

Character filter: "tidy up" a string before it is tokenized e.g. remove HTML tags
Tokenizer: It's used to break up the string into individual terms or tokens. Must have 1 only.
Token filter: change, add or remove tokens. Stemmer is an example of token filter. It's used to get the base of the word e.g. happy and happiness both have the same base is happi.

See Snowball demo here

This is a sample setting:

     {       "settings":{         "index" : {             "analysis" : {                 "analyzer" : {                     "analyzerWithSnowball" : {                         "tokenizer" : "standard",                         "filter" : ["standard", "lowercase", "englishSnowball"]                     }                 },                 "filter" : {                     "englishSnowball" : {                         "type" : "snowball",                         "language" : "english"                     }                 }             }         }       }     }

Ref:

Comparison of Lucene Analyzers
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-analyzers.html

answered Sep 23 '22 07:09

Tho

Related questions
                            
                                Lucene Query String Elasticsearch "less than or equal to"[URI Search]
                            
                                How to get latest values for each group with an Elasticsearch query?
                            
                                Elasticsearch can't write to log files
                            
                                Elasticsearch read_only_allow_delete auto setting
                            
                                Best way to check if a field exist in an Elasticsearch document
                            
                                ElasticSearch: Allow only local requests
                            
                                Elastic Search startup error - "\Common was unexpected at this time."
                            
                                Difference between keyword and text in ElasticSearch
                            
                                High disk watermark exceeded even when there is not much data in my index
                            
                                Return distance in elasticsearch results?
                            
                                AWS elastic-search. FORBIDDEN/8/index write (api). Unable to write to index
                            
                                How to add a numeric filter on kibana dashboard?
                            
                                Get all index and types' names from cluster in ElasticSearch
                            
                                Can I use Elasticsearch free of charge? [closed]
                            
                                How to show index creation time with _cat/indices API in Elasticsearch
                            
                                How to test ElasticSearch in a Rails application (Rspec)
                            
                                How to use an Array mapping in ES?
                            
                                How to move elasticsearch data directory?
                            
                                Is there a smarter way to reindex elasticsearch?
                            
                                Best practices for searchable archive of thousands of documents (pdf and/or xml)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Analyzers in elasticsearch

Tags:

elasticsearch

analyzer

tire

Vamsi Krishna

People also ask

2 Answers

dadoonet

Tho

Recent Activity

Donate For Us