Changing the default analyzer in ElasticSearch or LogStash

Tags:

I've got data coming in from Logstash that's being analyzed in an overeager manner. Essentially, the field "OS X 10.8" would be broken into "OS", "X", and "10.8". I know I could just change the mapping and re-index for existing data, but how would I change the default analyzer (either in ElasticSearch or LogStash) to avoid this problem in future data?

Concrete Solution: I created a mapping for the type before I sent data to the new cluster for the first time.

Solution from IRC: Create an Index Template

246

asked Nov 04 '13 20:11

Brian Hicks

1 Answers

According this page analyzers can be specified per-query, per-field or per-index.

At index time, Elasticsearch will look for an analyzer in this order:

The analyzer defined in the field mapping.
An analyzer named default in the index settings.
The standard analyzer.

At query time, there are a few more layers:

The analyzer defined in a full-text query.
The search_analyzer defined in the field mapping.
The analyzer defined in the field mapping.
An analyzer named default_search in the index settings.
An analyzer named default in the index settings.
The standard analyzer.

On the other hand, this page point to important thing:

An analyzer is registered under a logical name. It can then be referenced from mapping definitions or certain APIs. When none are defined, defaults are used. There is an option to define which analyzers will be used by default when none can be derived.

So the only way to define a custom analyzer as default is overriding one of pre-defined analyzers, in this case the default analyzer. it means we can not use an arbitrary name for our analyzer, it must be named default

here a simple example of index setting:

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "char_filter": {
        "charMappings": {
          "type": "mapping",
          "mappings": [
            "\\u200C => "
          ]
        }
      },
      "filter": {
        "persian_stop": {
          "type": "stop",
          "stopwords_path": "stopwords.txt"
        }
      },
      "analyzer": {
        "default": {<--------- analyzer name must be default
          "tokenizer": "standard",
          "char_filter": [
            "charMappings"
          ],
          "filter": [
            "lowercase",
            "arabic_normalization",
            "persian_normalization",
            "persian_stop"
          ]
        }
      }
    }
  }
}

answered Sep 22 '22 07:09

Saeed Zhiany

Related questions
                            
                                Speed of looking for byte[] in byte[] and string in string - why latter is faster?
                            
                                How to change the default search mode to "File Search" in Eclipse?
                            
                                Substring matches within SOLR
                            
                                Smallest code possible to filter checkboxlist through javascript
                            
                                Datatables - how to pass search parameter in a url
                            
                                Fuzzy Like This (FLT) - ElasticSearch
                            
                                Database search like google [duplicate]
                            
                                WooCommerce search result template
                            
                                Algolia vs Solr search
                            
                                Take every k-th element from the (1 .. n) natural numbers series
                            
                                Search keyword highlight in ASP.Net
                            
                                how to hide Showing result text in liferay Search Container?
                            
                                JQuery Dynatree - search node by name
                            
                                How to get more than one field with django filter icontains
                            
                                given an array of sorted ints, find the most frequently occuring element in log(n)
                            
                                VIM: how to add search/replace command to vimrc and map to a shortcut
                            
                                Magento - Autocomplete Suggest Search Not Working
                            
                                Lucene - searching for a numeric value field
                            
                                How do I search Google Spreadsheets?
                            
                                "1 of n" result for Emacs search

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Changing the default analyzer in ElasticSearch or LogStash

Tags:

search

elasticsearch

logstash

Brian Hicks

People also ask

1 Answers

Saeed Zhiany

Recent Activity

Donate For Us