I am evaluating Elastic Search for a client. I have begun playing with the API and succesfully created an index and added documents to the search. The main reason for using Elastic Search is that it provides facets functionality. I am having trouble understanding Analyzers, Tokenizers and Filters and how do they fit in with facets. I want to be able to use keywords, dates, search terms, etc as my facets. How would I go about incorporating Analyzers into my search and how can I use it with facets?

When Elastic Search indexes a string by default, usually it breaks them up into tokens, for example: "Fox jump over the wall" will be tokenized into individual words as "Fox", "jump", "over", "the", "wall". So what does this do? If you were to search through your documents using the Lucene Query, you may not get the string that you want because Elastic Search will automatically search for tokenized words instead of the entire string, thus your search results will be severely affected. For example, if you search for "Fox jump over the wall", you will not get you any result. Searching for "Fox" instead will get you a result. The Analyze API or the analyze term tells Elastic Search not to tokenize the indexed string, so that you can properly search for exact strings, which is particularly useful when you want to do statistical facets on entire strings. Tokenizers just tokenize strings into individual words and stores them in Elastic Search. As mentioned, these tokens can be queried against using the Search API. Filters create a subset of your queried result under specific conditions which you specify, thus helping you separate what you need from what you do not need in your search results.

Elastic Search Analyzers and Facets

Tags:

elasticsearch

facets

I am evaluating Elastic Search for a client. I have begun playing with the API and succesfully created an index and added documents to the search. The main reason for using Elastic Search is that it provides facets functionality.

I am having trouble understanding Analyzers, Tokenizers and Filters and how do they fit in with facets. I want to be able to use keywords, dates, search terms, etc as my facets.

How would I go about incorporating Analyzers into my search and how can I use it with facets?

465

asked Jun 06 '12 23:06

Gabbar

1 Answers

When Elastic Search indexes a string by default, usually it breaks them up into tokens, for example: "Fox jump over the wall" will be tokenized into individual words as "Fox", "jump", "over", "the", "wall".

So what does this do? If you were to search through your documents using the Lucene Query, you may not get the string that you want because Elastic Search will automatically search for tokenized words instead of the entire string, thus your search results will be severely affected.

For example, if you search for "Fox jump over the wall", you will not get you any result. Searching for "Fox" instead will get you a result.

The Analyze API or the analyze term tells Elastic Search not to tokenize the indexed string, so that you can properly search for exact strings, which is particularly useful when you want to do statistical facets on entire strings.

Tokenizers just tokenize strings into individual words and stores them in Elastic Search. As mentioned, these tokens can be queried against using the Search API.

Filters create a subset of your queried result under specific conditions which you specify, thus helping you separate what you need from what you do not need in your search results.

136

answered Dec 30 '22 04:12

Jonathan Moo

Related questions
                            
                                Missing data when using unique count and creating an aggregation in Kibana
                            
                                How to search emoticon/emoji in elasticsearch?
                            
                                Elasticsearch outputs the score of 1.0 for all results when searching for a single "starred" term
                            
                                Error using Object Initializer syntax to create MultiMatchQuery
                            
                                Elasticsearch Multiple Prefix Keywords
                            
                                How to find Index by Alias in Elasticsearch php client api
                            
                                Does Spring Data support Elasticsearch 5.x?
                            
                                Connecting NiFi to ElasticSearch
                            
                                Index JSON files in elasticsearch using Python?
                            
                                post request with \n-delimited JSON in python
                            
                                Elastic Search, Java API: Validation Failed: 1: script or doc is missing;
                            
                                Custom sorting in Elasticsearch
                            
                                What match_none is useful for?
                            
                                Elasticsearch date range query using two fields
                            
                                Elasticsearch : map date as text?
                            
                                Spring data elastic search with Java high level rest client [closed]
                            
                                How to build a free flow search on sql-server database tables?
                            
                                Elasticsearch how to support transaction involving multiple documents
                            
                                how to enable xpack.security.enabled?
                            
                                The tag "beats_input_codec_plain_applied" present in every document in Kibana

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With