ElasticSearch default scoring mechanism

Tags:

What I am looking for, is plain, clear explanation, of how default scoring mechanism of ElasticSearch (Lucene) really works. I mean, does it use Lucene scoring, or maybe it uses scoring of its own?

For example, I want to search for document by, for example, "Name" field. I use .NET NEST client to write my queries. Let's consider this type of query:

IQueryResponse<SomeEntity> queryResult = client.Search<SomeEntity>(s =>
    s.From(0)
   .Size(300)
   .Explain()
   .Query(q => q.Match(a => a.OnField(q.Resolve(f => f.Name)).QueryString("ExampleName")))
);

which is translated to such JSON query:

{
 "from": 0,
 "size": 300,
 "explain": true,
 "query": {
   "match": {
     "Name": {
       "query": "ExampleName"
      }
    }
  }
}

There is about 1.1 million documents that search is performed on. What I get in return, is (that is only part of the result, formatted on my own):

650   "ExampleName" 7,313398

651   "ExampleName" 7,313398

652   "ExampleName" 7,313398

653   "ExampleName" 7,239194

654   "ExampleName" 7,239194

860   "ExampleName of Something" 4,5708737

where first field is just an Id, second is Name field on which ElasticSearch performed it's searching, and third is score.

As you can see, there are many duplicates in ES index. As some of found documents have diffrent score, despite that they are exactly the same (with only diffrent Id), I concluded that diffrent shards performed searching on diffrent parts of whole dataset, which leads me to trail that the score is somewhat based on overall data in given shard, not exclusively on document that is actually considered by search engine.

The question is, how exactly does this scoring work? I mean, could you tell me/show me/point me to exact formula to calculate score for each document found by ES? And eventually, how this scoring mechanism can be changed?

527

asked Jul 08 '13 08:07

Przemysław Kalita

1 Answers

The default scoring is the DefaultSimilarity algorithm in core Lucene, largely documented here. You can customize scoring by configuring your own Similarity, or using something like a custom_score query.

The odd score variation in the first five results shown seems small enough that it doesn't concern me much, as far as the validity of the query results and their ordering, but if you want to understand the cause of it, the explain api can show you exactly what is going on there.

answered Oct 08 '22 04:10

femtoRgon

Related questions
                            
                                AWS S3: how to list object by tags [duplicate]
                            
                                Finding k-nearest neighbors for a given vector?
                            
                                How to cancel a PHP process, when ajax call is cancelled?
                            
                                Elasticsearch lowercase filter search
                            
                                "Did you mean" feature on a dictionary database
                            
                                php mysql fulltext search: lucene, sphinx, or?
                            
                                What's the best way to identify unicode encoded text files in Windows?
                            
                                Difference between exploration and exploitation in genetic algorithm
                            
                                Best Practices for Handling Search
                            
                                Find starting and ending indices of sublist in list
                            
                                Advanced search with Drupal (Views and CCK)
                            
                                Search through PDF files with PHP
                            
                                I wonder how reverse image search services like tineye.com work ...?
                            
                                How to approximate the count of distinct values in an array in a single pass through it
                            
                                In Android Studio, is there a way to search a word within jar, aar, and maven imported libraries?
                            
                                ElasticSearch NEST Search Multiple Types & All Fields
                            
                                Converting user-entered search query to where clause for use in SQL Server full-text search
                            
                                how could I make a search match for similar words
                            
                                SOLR df and qf explanation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ElasticSearch default scoring mechanism

Tags:

search

lucene

elasticsearch

scoring

Przemysław Kalita

People also ask

1 Answers

femtoRgon

Recent Activity

Donate For Us