Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why not use min_score with Elasticsearch?

New to Elasticsearch. I am interested in only returning the most relevant docs and came across min_score. They say "Note, most times, this does not make much sense" but doesn't provide a reason. So, why does it not make sense to use min_score?

EDIT: What I really want to do is only return documents that have a higher than x "score". I have this:

data = {
        'min_score': 0.9,
        'query': {
            'match': {'field': 'michael brown'},
        }
    }

Is there a better alternative to the above so that it only returns the most relevant docs?

thx!

EDIT #2: I'm using minimum_should_match and it returns a 400 error:

"error": "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed;"

data = {
        'query': {
            'match': {'keywords': 'michael brown'},
            'minimum_should_match': '90%',
        }
    }
like image 894
user_78361084 Avatar asked Sep 05 '14 05:09

user_78361084


People also ask

What is score mode in Elasticsearch?

The function_score allows you to modify the score of documents that are retrieved by a query. This can be useful if, for example, a score function is computationally expensive and it is sufficient to compute the score on a filtered set of documents.

How does Elasticsearch calculate score?

Before scoring documents, Elasticsearch first reduces the set of candidate documents by applying a boolean test that only includes documents that match the query. A score is then calculated for each document in this set, and this score determines how the documents are ordered.

What is Max score in Elasticsearch?

The idea is quite simple: say that you want to collect the top 10 matches, that the maximum score for the term "elasticsearch" is 3.0 and the maximum score for the term "kibana" is 5.0.

How do I increase Elasticsearch score?

Just add a "boost" field or similar with a numerical value and order by that first in your query (and by score second).


1 Answers

I don't know if it's the best solution, but it works for me (java):

// "tiny" search to discover maxScore
// it is fast, because it returns only 1 item
SearchResponse response = client.prepareSearch(INDEX_NAME)
                        .setTypes(TYPE_NAME)
                        .setQuery(queryBuilder)
                        .setSize(1)
                        .execute()
                        .actionGet();

// get the maxScore and
// and set minScore = 70%
float maxScore = response.getHits().maxScore();
float minScore = maxScore * 0.7;

// second round with minimum score
SearchResponse response = client.prepareSearch(INDEX_NAME)
                        .setTypes(TYPE_NAME)
                        .setQuery(queryBuilder)
                        .setMinScore(minScore)
                        .execute()
                        .actionGet();

I search twice, but the first time it's fast because it returns only 1 item, then we can get the max_score

NOTE: minimum_should_match work different. If you have 4 queries, and you say minimum_should_match = 70%, it doesn't mean that item.score should be > 70%. It means that the item should match 70% of the queries, that is minimum 3/4 queries

like image 117
César Mora Avatar answered Sep 22 '22 01:09

César Mora