Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting Elasticsearch Analyzer for new fields in logstash

By using GROK filter, We can add new field to Logstash.

But then, here I am wondering how to set the analyzer for that particular field.

For eg: , I have a new id field which has a field like a_b , but the the normal analyzer shipped by Elasticsearch, will break this into a and b. Because of this I can't apply the terms feature on that particular field efficiently and make it useful.

Here for the ID field, I want to apply a custom analyzer of my own, which don't tokenize the value but applied lowercase filter.

How can this be done in logstash.

like image 995
Vineeth Mohan Avatar asked Dec 08 '13 14:12

Vineeth Mohan


1 Answers

The default analyzer in Elasticsearch will tokenize the terms using the Standard tokenizer, which will tokenize a b into two terms: a and b, which after the default stop-words token filter will be turned into just the single term b. See this play for a small example that shows how the text is analyzed.

In order to analyze this the way you specified, we have to configure our analyzer as such:

"analyzer": {
    "my_id_analyzer": {
        "type": "custom",
        "tokenizer": "keyword",
        "filters": ["lowercase"]
    }
}

Byt since Logstash usually creates new indexes when required, we have to make sure this analyzer is available for all the indexes when they're created. There's two ways to achieve this: 1) Add it to the Elasticsearch instance configuration (elasticsearch.yml), or 2) create a index template that includes the analyzer.

Since we only need this analyzer on specific indexes (i.e indexes with the prefix logstash-). By using the Index template API, we can do the following:

curl localhost:9200/_template/logstash-id -XPUT -d '{
    "template": "logstash-*",
    "settings" : {
        "analysis": {
            "analyzer": {
                "my_id_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filters": ["lowercase"]
                }
            }
        }
    },
    "mappings": {
        "_default_": {
             "properties" : {
                "id" : { "type" : "string", "analyzer" : "my_id_analyzer" }
            }
        }
    }
}'

After performing the above command, this template will apply to any index with the prefix logstash- created afterwards, and the the only "magic" part is the added mapping definition, which uses the built-in type _default_, which is a placeholder for "any" type in the given index. This means that the mapping will be added to any type, regardless of its actual type name.

like image 133
Njal Karevoll Avatar answered Nov 15 '22 07:11

Njal Karevoll