I am evaluating Elastic Search for a client. I have begun playing with the API and succesfully created an index and added documents to the search. The main reason for using Elastic Search is that it provides facets functionality.
I am having trouble understanding Analyzers, Tokenizers and Filters and how do they fit in with facets. I want to be able to use keywords, dates, search terms, etc as my facets.
How would I go about incorporating Analyzers into my search and how can I use it with facets?
A facet is a tool that your users can use to further tune search results to their liking. It will generate a count for a value or range based on a field within a schema.
Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.
Elasticsearch is a NoSQL database and analytics engine, which can process any type of data, structured or unstructured, textual or numerical. Developed by Elasticsearch N.V. (now Elastic) and based on Apache Lucene, it is free, open-source, and distributed in nature.
Elasticsearch runs Lucene under the hood so by default it uses Lucene's Practical Scoring Function. This is a similarity model based on Term Frequency (tf) and Inverse Document Frequency (idf) that also uses the Vector Space Model (vsm) for multi-term queries.
When Elastic Search indexes a string by default, usually it breaks them up into tokens, for example: "Fox jump over the wall" will be tokenized into individual words as "Fox", "jump", "over", "the", "wall".
So what does this do? If you were to search through your documents using the Lucene Query, you may not get the string that you want because Elastic Search will automatically search for tokenized words instead of the entire string, thus your search results will be severely affected.
For example, if you search for "Fox jump over the wall", you will not get you any result. Searching for "Fox" instead will get you a result.
The Analyze API or the analyze term tells Elastic Search not to tokenize the indexed string, so that you can properly search for exact strings, which is particularly useful when you want to do statistical facets on entire strings.
Tokenizers just tokenize strings into individual words and stores them in Elastic Search. As mentioned, these tokens can be queried against using the Search API.
Filters create a subset of your queried result under specific conditions which you specify, thus helping you separate what you need from what you do not need in your search results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With