Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch an edgeNGram for autocomplete\typeahead, is my search_analyzer being ignored

I've got three documents with a "userName" field:

  • 'briandilley'
  • 'briangumble'
  • 'briangriffen'

when i search for 'brian' i get all three back as expected, but when i search for 'briandilley' i still get all three back. The analyze API is telling me that it's using the ngram filter on my search string, but i'm not sure why. here's my setup:

index settings:

{
    "analysis": {
        "analyzer": {
            "username_index": {
                "tokenizer": "keyword",
                "filter": ["lowercase", "username_ngram"]
            },
            "username_search": {
                "tokenizer": "keyword",
                "filter": ["lowercase"]
            }
        },
        "filter": {
            "username_ngram": {
                "type": "edgeNGram",
                "side" : "front",
                "min_gram": 1,
                "max_gram": 15
            }
        }
    }
}

mapping:

{
    "user_follow": {

        "properties": {
            "targetId": { "type": "string", "store": true },
            "followerId": { "type": "string", "store": true },
            "dateUpdated": { "type": "date", "store": true },

            "userName": {
                "type": "multi_field",
                "fields": {
                    "userName": {
                        "type": "string",
                        "index": "not_analyzed"
                    },
                    "autocomplete": {
                        "type": "string",
                        "index_analyzer": "username_index",
                        "search_analyzer": "username_search"
                    }
                }
            }
        }
    }
}

search:

{
    "from" : 0,
    "size" : 50,
    "query" : {
        "bool" : {
            "must" : [ {
                "field" : {
                    "targetId" : "51888c1b04a6a214e26a4009"
                }
            }, {
                "match" : {
                    "userName.autocomplete" : {
                        "query" : "brian",
                        "type" : "boolean"
                    }
                }
            } ]
        }
    },
    "fields" : "followerId"
}

I've tried matchQuery, matchPhraseQuery, textQuery and termQuery (java DSL api) and i get the same results every time.

like image 380
Brian Dilley Avatar asked May 07 '13 05:05

Brian Dilley


People also ask

How does Elasticsearch implement autocomplete?

Autocomplete can be achieved by changing match queries to prefix queries. While match queries work on token (indexed) to token (search query tokens) match, prefix queries (as their name suggests) match all the tokens starting with search tokens, hence the number of documents (results) matched is high.

What is edgengram?

The edge_ngram tokenizer first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word where the start of the N-gram is anchored to the beginning of the word. Edge N-Grams are useful for search-as-you-type queries.

What is Elasticsearch analyzer?

In a nutshell an analyzer is used to tell elasticsearch how the text should be indexed and searched. And what you're looking into is the Analyze API, which is a very nice tool to understand how analyzers work. The text is provided to this API and is not related to the index.


1 Answers

I think that you're not doing exactly what you think you're doing. This is why it is useful to present an actual test case with full curl statements, rather than abbreviating it.

Your example above works for me (slightly modified):

Create the index with settings and mapping:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
  "mappings" : {
     "test" : {
        "properties" : {
           "userName" : {
              "fields" : {
                 "autocomplete" : {
                    "search_analyzer" : "username_search",
                    "index_analyzer" : "username_index",
                    "type" : "string"
                 },
                 "userName" : {
                    "index" : "not_analyzed",
                    "type" : "string"
                 }
              },
              "type" : "multi_field"
           }
        }
     }
  },
  "settings" : {
     "analysis" : {
        "filter" : {
           "username_ngram" : {
              "max_gram" : 15,
              "min_gram" : 1,
              "type" : "edge_ngram"
           }
        },
        "analyzer" : {
           "username_index" : {
              "filter" : [
                 "lowercase",
                 "username_ngram"
              ],
              "tokenizer" : "keyword"
           },
           "username_search" : {
              "filter" : [
                 "lowercase"
              ],
              "tokenizer" : "keyword"
           }
        }
     }
  }
}
'

Index some data:

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '{
  "userName" : "briangriffen"
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
  "userName" : "brianlilley"
}
'

curl -XPOST 'http://127.0.0.1:9200/test/test?pretty=1'  -d '
{
  "userName" : "briangumble"
}
'

A search for brian finds all documents:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '{
  "query" : {
     "match" : {
        "userName.autocomplete" : "brian"
     }
  }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "userName" : "briangriffen"
#             },
#             "_score" : 0.1486337,
#             "_index" : "test",
#             "_id" : "AWzezvEFRIykOAr75QbtcQ",
#             "_type" : "test"
#          },
#          {
#             "_source" : {
#                "userName" : "briangumble"
#             },
#             "_score" : 0.1486337,
#             "_index" : "test",
#             "_id" : "qIABuMOiTyuxLOiFOzcURg",
#             "_type" : "test"
#          },
#          {
#             "_source" : {
#                "userName" : "brianlilley"
#             },
#             "_score" : 0.076713204,
#             "_index" : "test",
#             "_id" : "fGgTITKvR6GJXI_cqA4Vzg",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.1486337,
#       "total" : 3
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 8
# }

A search for brianlilley finds just that document:

curl -XGET 'http://127.0.0.1:9200/test/test/_search?pretty=1'  -d '
{
  "query" : {
     "match" : {
        "userName.autocomplete" : "brianlilley"
     }
  }
}
'

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "userName" : "brianlilley"
#             },
#             "_score" : 0.076713204,
#             "_index" : "test",
#             "_id" : "fGgTITKvR6GJXI_cqA4Vzg",
#             "_type" : "test"
#          }
#       ],
#       "max_score" : 0.076713204,
#       "total" : 1
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 4
# }
like image 80
DrTech Avatar answered Sep 21 '22 13:09

DrTech