Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default index analyzer in elasticsearch

I am facing a problem with elasticsearch where I dont want my indexed term to be analyzed. But the elasticsearch has some default setting which is tokenizing it on space. Therefore my facet query is not returning the result I want.

I read that "index" : "not_analyzed" in properties of index type should work. But the problem is that I dont know my document structure before hand. I would be indexing random MySQL databases to elasticsearch without knowing the table structure.

How can I setup elasticsearch such that by default it uses "index" : "not_analyzed" until otherwise asked for. Thanks

PS: I am using java if I can directly use any API for it I would love it.

like image 216
Global Warrior Avatar asked Apr 29 '14 13:04

Global Warrior


People also ask

What are analyzers in Elasticsearch?

In a nutshell an analyzer is used to tell elasticsearch how the text should be indexed and searched. And what you're looking into is the Analyze API, which is a very nice tool to understand how analyzers work. The text is provided to this API and is not related to the index.

What is the default tokenizer Elasticsearch?

A standard tokenizer is used by Elasticsearch by default, which breaks the words based on grammar and punctuation. In addition to the standard tokenizer, there are a handful of off-the-shelf tokenizers: standard, keyword, N-gram, pattern, whitespace, lowercase and a handful of other tokenizers.

What is tokenizer in Elasticsearch?

A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace.


2 Answers

I'd use dynamic templates - it should do what you are looking for:

{
    "testtemplates" : {
        "dynamic_templates" : [
            {
                "template1" : {
                    "match" : "*",
                    "match_mapping_type" : "string",
                    "mapping" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                    }
                }
            }
        ]
    }
}

More on this approach here:

https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html#dynamic-templates

Important: If someone suggest this approach to solve the not_analyzed issue, it will not work! keyword analyzer does some analyzing on the data and convert the data to small letters.

e.g. Data: ElasticSearchRocks ==> Keyword Analyzer: elasticsearchrocks

Try it yourself with analyzing query and see it happening.

curl -XPUT localhost:9200/testindex -d '{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "default" : {
                    "type" : "keyword"
                }
            }
       }
    }
}'

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.html

like image 161
John Petrone Avatar answered Sep 22 '22 14:09

John Petrone


add index.analysis.analyzer.default.type: keyword in your elasticsearch.yml.

like image 21
Alex Ivasyuv Avatar answered Sep 23 '22 14:09

Alex Ivasyuv