Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Search Analyzers and Facets

I am evaluating Elastic Search for a client. I have begun playing with the API and succesfully created an index and added documents to the search. The main reason for using Elastic Search is that it provides facets functionality.

I am having trouble understanding Analyzers, Tokenizers and Filters and how do they fit in with facets. I want to be able to use keywords, dates, search terms, etc as my facets.

How would I go about incorporating Analyzers into my search and how can I use it with facets?

like image 465
Gabbar Avatar asked Jun 06 '12 23:06

Gabbar


People also ask

What are facets Elasticsearch?

A facet is a tool that your users can use to further tune search results to their liking. It will generate a count for a value or range based on a field within a schema.

What is Elasticsearch is used for?

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.

What is Elasticsearch concept?

Elasticsearch is a NoSQL database and analytics engine, which can process any type of data, structured or unstructured, textual or numerical. Developed by Elasticsearch N.V. (now Elastic) and based on Apache Lucene, it is free, open-source, and distributed in nature.

What algorithm does Elasticsearch use?

Elasticsearch runs Lucene under the hood so by default it uses Lucene's Practical Scoring Function. This is a similarity model based on Term Frequency (tf) and Inverse Document Frequency (idf) that also uses the Vector Space Model (vsm) for multi-term queries.


1 Answers

When Elastic Search indexes a string by default, usually it breaks them up into tokens, for example: "Fox jump over the wall" will be tokenized into individual words as "Fox", "jump", "over", "the", "wall".

So what does this do? If you were to search through your documents using the Lucene Query, you may not get the string that you want because Elastic Search will automatically search for tokenized words instead of the entire string, thus your search results will be severely affected.

For example, if you search for "Fox jump over the wall", you will not get you any result. Searching for "Fox" instead will get you a result.

The Analyze API or the analyze term tells Elastic Search not to tokenize the indexed string, so that you can properly search for exact strings, which is particularly useful when you want to do statistical facets on entire strings.

Tokenizers just tokenize strings into individual words and stores them in Elastic Search. As mentioned, these tokens can be queried against using the Search API.

Filters create a subset of your queried result under specific conditions which you specify, thus helping you separate what you need from what you do not need in your search results.

like image 136
Jonathan Moo Avatar answered Dec 30 '22 04:12

Jonathan Moo