Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch n-gram tokenfilter not finding partial words

I have been playing around with ElasticSearch for a new project of mine. I have set the default analyzers to use the ngram tokenfilter. This is my elasticsearch.yml file:

index:
analysis:
    analyzer:
        default_index:
            tokenizer: standard
            filter: [standard, stop, mynGram]
        default_search:
            tokenizer: standard
            filter: [standard, stop]

    filter:
        mynGram:
            type: nGram
            min_gram: 1
            max_gram: 10

I created a new index and added the following document to it:

$ curl -XPUT http://localhost:9200/test/newtype/3 -d '{"text": "one two three four five six"}'
{"ok":true,"_index":"test","_type":"newtype","_id":"3"}

However, when I search using the query text:hree or text:ive or any other partial terms, ElasticSearch does not return this document. It returns the document only when I search for the exact term (like text:two).

I have also tried changing the config file such that default_search also uses the ngram token filter, but the result was the same. What am I doing wrong here and how do I correct it?

like image 630
asleepysamurai Avatar asked Feb 18 '11 17:02

asleepysamurai


2 Answers

You should check the get mapping API to see if your mapping has been applied: http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html

Btw it has been said on the mailing list that when an index already contains documents, the mappings you put on the elasticsearch.yml are not applied. You need to clean your index first.

I've tried ngrams with ES and it works fine for me.

like image 28
Sebastien Lorber Avatar answered Oct 27 '22 05:10

Sebastien Lorber


Not sure about the default_* settings. But applying a mapping that specifies index_analyzer and search_analyzer works:

curl -XDELETE localhost:9200/twitter
curl -XPOST localhost:9200/twitter -d '
{"index": 
  { "number_of_shards": 1,
    "analysis": {
       "filter": {
                  "mynGram" : {"type": "nGram", "min_gram": 2, "max_gram": 10}
                 },
       "analyzer": { "a1" : {
                    "type":"custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "mynGram"]
                    }
                  } 
     }
  }
}
}'

curl -XPUT localhost:9200/twitter/tweet/_mapping -d '{
    "tweet" : {
        "index_analyzer" : "a1",
        "search_analyzer" : "standard", 
        "date_formats" : ["yyyy-MM-dd", "dd-MM-yyyy"],
        "properties" : {
            "user": {"type":"string", "analyzer":"standard"},
            "message" : {"type" : "string" }
        }
    }}'

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elastic Search"
}'

curl -XGET localhost:9200/twitter/_search?q=ear
curl -XGET localhost:9200/twitter/_search?q=sea

curl -XGET localhost:9200/twitter/_mapping
like image 164
bdargan Avatar answered Oct 27 '22 04:10

bdargan