Elasticsearch not returning an exact match first

Tags:

elasticsearch

I have an elastic search index with a field for exact matches, and somehow i get both a lot of similar results (which I don't mind) and those similar results en up sorted before the exact match, (which i do mind.)

Can someone explain what's going on and how to fix it?

My mapping is like this

"exact":{
  "type":"string",
  "boost":10.0,
  "analyzer":"keyword"
},

My query that searches for "AAPL P JAN 2014 885,00" is like this:

{
  "size" : 21,
  "query" : {
    "field" : {
      "exact" : "AAPL P JAN 2014 885,00"
    }
  },
  "explain" : true,
  "sort" : [ {
    "_score" : {
      "order" : "desc"
    }
  } ],
  "facets" : {
    "category" : {
      "terms" : {
        "field" : "category",
        "size" : 10
      }
    }
  }
}

And the returned documents end up in this order:

{"exact":["APPLE INC","US0378331005","AAPL","73773"],"id-compound":"AAPL"}
{"exact":["AAPL","73773","AAPL P JAN 2014 675,00"],"id-compound":"AAPL*PUT*20140118*675"}
{"exact":["AAPL","73773","AAPL C JAN 2014 500,00"],"id-compound":"AAPL*CALL*20140118*500"}

etc, with the exact match a bunch of results down the line.

Can someone explain to me why the exact match doesn't end on top?

The search results with full explain is below if it helps make sense of things.

"hits" : [ {
  "_shard" : 0,
  "_node" : "1",
  "_index" : "instruments",
  "_type" : "instrument",
  "_id" : "AAPL",
  "_score" : 1306.8339, "_source" : {"exact":["APPLE INC","US0378331005","AAPL","73773"],"id-compound":"AAPL"},
  "_explanation" : {
    "value" : 1306.8339,
    "description" : "product of:",
    "details" : [ {
      "value" : 6534.169,
      "description" : "sum of:",
      "details" : [ {
        "value" : 6534.169,
        "description" : "weight(exact:AAPL in 9096), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 25272.875,
          "description" : "fieldWeight(exact:AAPL in 9096), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 4096.0,
            "description" : "fieldNorm(field=exact, doc=9096)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}, {
  "_shard" : 0,
  "_node" : "1",
  "_index" : "instruments",
  "_type" : "instrument",
  "_id" : "AAPL*PUT*20140118*675",
  "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 675,00"],"id-compound":"AAPL*PUT*20140118*675"},
  "_explanation" : {
    "value" : 163.35423,
    "description" : "product of:",
    "details" : [ {
      "value" : 816.7711,
      "description" : "sum of:",
      "details" : [ {
        "value" : 816.7711,
        "description" : "weight(exact:AAPL in 18), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 3159.1094,
          "description" : "fieldWeight(exact:AAPL in 18), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 512.0,
            "description" : "fieldNorm(field=exact, doc=18)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}, {
  "_shard" : 0,
  "_node" : "1",
  "_index" : "instruments",
  "_type" : "instrument",
  "_id" : "AAPL*CALL*20140118*500",
  "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL C JAN 2014 500,00"],"id-compound":"AAPL*CALL*20140118*500"},
  "_explanation" : {
    "value" : 163.35423,
    "description" : "product of:",
    "details" : [ {
      "value" : 816.7711,
      "description" : "sum of:",
      "details" : [ {
        "value" : 816.7711,
        "description" : "weight(exact:AAPL in 383), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 3159.1094,
          "description" : "fieldWeight(exact:AAPL in 383), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 512.0,
            "description" : "fieldNorm(field=exact, doc=383)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}, {
  "_id" : "AAPL*PUT*20140118*940",
  "_score" : 163.35423, "_source" : {"exact":["AAPL","73773","AAPL P JAN 2014 940,00"],"id-compound":"AAPL*PUT*20140118*940"},
  "_explanation" : {
    "value" : 163.35423,
    "description" : "product of:",
    "details" : [ {
      "value" : 816.7711,
      "description" : "sum of:",
      "details" : [ {
        "value" : 816.7711,
        "description" : "weight(exact:AAPL in 794), product of:",
        "details" : [ {
          "value" : 0.25854474,
          "description" : "queryWeight(exact:AAPL), product of:",
          "details" : [ {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 0.0419026,
            "description" : "queryNorm"
          } ]
        }, {
          "value" : 3159.1094,
          "description" : "fieldWeight(exact:AAPL in 794), product of:",
          "details" : [ {
            "value" : 1.0,
            "description" : "tf(termFreq(exact:AAPL)=1)"
          }, {
            "value" : 6.1701355,
            "description" : "idf(docFreq=211, maxDocs=37299)"
          }, {
            "value" : 512.0,
            "description" : "fieldNorm(field=exact, doc=794)"
          } ]
        } ]
      } ]
    }, {
      "value" : 0.2,
      "description" : "coord(1/5)"
    } ]
  }
}

and just in case where's what happens if i analyze the data i'm trying to store:

curl -XGET 'localhost:9200/instruments/_analyze?field=exact&pretty=true' -d 'ING  P JUN 2013 6.00'
{
  "tokens" : [ {
    "token" : "ING  P JUN 2013 6.00",
    "start_offset" : 0,
    "end_offset" : 20,
    "type" : "word",
    "position" : 1
  } ]

270

asked May 16 '13 15:05

Constantijn Visinescu

2 Answers

I'm not sure if it's technically the best thing but if you're just after a single specific answer from elastic search you could just use a filter with a script that looked for an exact match.

{
  from : 0,
  size : 1,
  "query" : { 
    "text_phrase" : { 
      "title" : "AAPL P JAN 2014 885,00"
    } 
  },
  "filter" : { 
    "script" : { 
      "script" : "_source.exact.contains(x)", 
      "params" : { 
        "x" : "AAPL P JAN 2014 885,00" 
      }  
    } 
  }
}

I've used this to obtain a single known entry from elastic search and it worked well for me.

159

answered Sep 30 '22 19:09

Matt Matthias

I think you have found you answer, just wanted to give a bit more info for other with the same problem.

You use a field query which from the elasticsearch documentation:

Field Query:

A query that executes a query string against a specific field. It is a simplified version of query_string query (by setting the default_field to the field this query executed against).

I believe a query_string query is for text, i.e.: it does a lot to the query, making it fuzzy, etc...

What you want to use (and I think you found this out) is a term query which will not do anything to the search phrase, and so only give you exact matches.

NOTE: Analysis happens at 2 distinct times, index time and query time. Setting "analyzer": "keyword" seems to only affect search time queries "when searching using a query string" form elasticsearch docs. I must admit I don't know exactly what that means (I would guess query_string but it could also mean for searches like http://../_search?q=exact:{query here})

answered Sep 30 '22 19:09

ramseykhalaf

Related questions
                            
                                End of search results using search_after parameter from Elastic Search API
                            
                                CQRS: project out-of-order notifications in an ElasticSearch read model
                            
                                How to implement ACL on an ElasticSearch-based system?
                            
                                Storing nested objects in elastic search
                            
                                How to tune Elasticsearch to make it indexing fast?
                            
                                Using AWS4 Signature via Postman for CRUD Elastic operations
                            
                                Why elastic-search container memory usage keeps increasing with little use?
                            
                                Elasticsearch: Can it be used to avoid writing your own NLP? (e.g. Re-invent the wheel)
                            
                                Unable to search a query with symbols in elasticsearch
                            
                                How to percolate simple_query_string/query_string query
                            
                                How to combine completion, suggestion and match phrase across multiple text fields?
                            
                                elastic search update Service software release in AWS console
                            
                                Best practices for data storage with Elasticsearch and Kubernetes
                            
                                A better approach to exclude large list of items in Elasticsearch
                            
                                CouchDB, Elastic Search, and River Plugin not operating correctly
                            
                                elastic search double facet
                            
                                Is there a way to remove the calculation of length norms for fields in elastic search?
                            
                                How to kill the thread of searching request on elasticsearch cluster? Is there some API to do this?
                            
                                Is there a graphic tool to display (and maybe change) elasticsearch mappings?
                            
                                How can I check indices.memory.index_buffer_size parameter is effectively working in elasticsearch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With