Trying to control the order that token filters are applied in ElasticSearch.
I know from the docs that the tokenizer is applied first, then the token filters, but they do not mention how the order of the token filters is determined.
Here's a YAML snippet from my analysis setup script:
KeywordNameIndexAnalyzer :
type : custom
tokenizer : whitespace
filter : [my_word_concatenator, keyword_ngram]
I would have thought that my_word_concatenator
would be applied before keyword_ngram
, but it seems like that isn't the case. Anyone know how (or if) the order of these filters can be controlled?
Thanks a lot!
Token filters accept a stream of tokens from a tokenizer and can modify tokens (eg lowercasing), delete tokens (eg remove stopwords) or add tokens (eg synonyms). Elasticsearch has a number of built-in token filters you can use to build custom analyzers.
Elasticsearch analyzers and normalizers are used to convert text into tokens that can be searched. Analyzers use a tokenizer to produce one or more tokens per text field. Normalizers use only character filters and token filters to produce a single token.
ASCII folding token filtereditConverts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if one exists. For example, the filter changes à to a .
Tokenizers break field data into lexical units, or tokens. Filters examine a stream of tokens and keep them, transform or discard them, or create new ones. Tokenizers and filters may be combined to form pipelines, or chains, where the output of one is input to the next.
An analyzer is made of a tokenizer, which splits your text into tokens. After that token filters come into the picture, in the order you configured them, since you're providing an array. If you have doubts I'd suggest you to have a look at the analyze api, through which you can actually test how a analyzer works without indexing any text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With