Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing the default analyzer in ElasticSearch or LogStash

I've got data coming in from Logstash that's being analyzed in an overeager manner. Essentially, the field "OS X 10.8" would be broken into "OS", "X", and "10.8". I know I could just change the mapping and re-index for existing data, but how would I change the default analyzer (either in ElasticSearch or LogStash) to avoid this problem in future data?

Concrete Solution: I created a mapping for the type before I sent data to the new cluster for the first time.

Solution from IRC: Create an Index Template

like image 246
Brian Hicks Avatar asked Nov 04 '13 20:11

Brian Hicks


People also ask

What is the default analyzer for Elasticsearch?

By default, Elasticsearch uses the standard analyzer for all text analysis. The standard analyzer gives you out-of-the-box support for most natural languages and use cases. If you chose to use the standard analyzer as-is, no further configuration is needed.

How do I add an analyzer to an existing index Elasticsearch?

To add an analyzer, you must close the index, define the analyzer, and reopen the index. You cannot close the write index of a data stream. To update the analyzer for a data stream's write index and future backing indices, update the analyzer in the index template used by the stream.

What is the default tokenizer Elasticsearch?

Configurationedit The maximum token length. If a token is seen that exceeds this length then it is split at max_token_length intervals. Defaults to 255 .


1 Answers

According this page analyzers can be specified per-query, per-field or per-index.

At index time, Elasticsearch will look for an analyzer in this order:

  • The analyzer defined in the field mapping.
  • An analyzer named default in the index settings.
  • The standard analyzer.

At query time, there are a few more layers:

  • The analyzer defined in a full-text query.
  • The search_analyzer defined in the field mapping.
  • The analyzer defined in the field mapping.
  • An analyzer named default_search in the index settings.
  • An analyzer named default in the index settings.
  • The standard analyzer.

On the other hand, this page point to important thing:

An analyzer is registered under a logical name. It can then be referenced from mapping definitions or certain APIs. When none are defined, defaults are used. There is an option to define which analyzers will be used by default when none can be derived.

So the only way to define a custom analyzer as default is overriding one of pre-defined analyzers, in this case the default analyzer. it means we can not use an arbitrary name for our analyzer, it must be named default

here a simple example of index setting:

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "char_filter": {
        "charMappings": {
          "type": "mapping",
          "mappings": [
            "\\u200C => "
          ]
        }
      },
      "filter": {
        "persian_stop": {
          "type": "stop",
          "stopwords_path": "stopwords.txt"
        }
      },
      "analyzer": {
        "default": {<--------- analyzer name must be default
          "tokenizer": "standard",
          "char_filter": [
            "charMappings"
          ],
          "filter": [
            "lowercase",
            "arabic_normalization",
            "persian_normalization",
            "persian_stop"
          ]
        }
      }
    }
  }
}
like image 98
Saeed Zhiany Avatar answered Sep 22 '22 07:09

Saeed Zhiany