Diversified results on Elasticsearch search

Tags:

I've done a complex query using the popularity to improve the results of social media documents using Elasticsearch. The query works really fine and the top results are always centered on the query and with interesting elements.

However it has a problem, for some queries the first results are all from the same user.

I would like to downscore a document if same user was retrieved on a higher document. This way I expect to have more diversification on the results.

Note that I don't want them to be removed, as in some cases it may still be interesting to find more documents of the same user, but I would like them to be in a lower position.

Can anybody suggest a way to make it work?

As suggested in some comments I update a (simplified version) of my query:

Click to copy

query = {"function_score": {
  "functions": [
    {"gauss": {"createdAt":
        {"origin": "now", "scale": "30d", "offset": "7d", "decay" :0.9 } 
    }},
    {"gauss": {"shares.last.twitter_retweets_log":
        {"origin": 4.52, "scale": 2.61, "decay" : 0.9} 
    }},
  ],
  "query": {"bool":{"must":[
    {"exists":{"field": "images"}},
    {"multi_match":{"query": "foo boo", fields:["text", "link.title"]}}
  ]}},
  "score_mode": "multiply"
}};

P.S: some documents that may be interesting, as they talk about diversity, but I'm not sure how to apply:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-sampler-aggregation.html?q=sampler
https://lucene.apache.org/core/5_1_0/misc/org/apache/lucene/search/DiversifiedTopDocsCollector.html

738

asked Dec 11 '15 10:12

David Mabodo

1 Answers

You can couple the sampler with the top_hits aggregation to get diversified results.

Click to copy

{
    "query": {
        "match": {
            "query": "iphone"
        }
    },
    "size":0,
    "aggs": {
        "sample": {
            "sampler": {
                "shard_size": 200,
                "field" : "user.id"                
            },
            "aggs": {
                "diversifiedMatches": {
                    "top_hits": {
                        "size":10
                    }
                }
            }
        }
    }
}

There are some caveats e.g:

1) Deduplication is per-shard not global

2) Choice of diversification field must be a single-value field

3) No support for pagination

4) No support for sorting on anything other than score

Addressing the above issues would be hard and would require expensive/complex co-ordination internally plus more guidance from the client about when and where "duplicate" results can be re-introduced (page 2? page 3? how many?) etc.

154

answered Sep 18 '22 13:09

MarkH

Related questions
                            
                                How to deploy AWS elasticsearch using serverless.yml
                            
                                Logstash with Elasticsearch
                            
                                How to find Index by Alias in Elasticsearch java api?
                            
                                Scroll example in ElasticSearch NEST API
                            
                                Elasticsearch Marvel - Turn off logging
                            
                                ElasticSearch index exists not working / reliable
                            
                                Elasticsearch store field vs _source
                            
                                Get the number of fields on an index
                            
                                Highlight whole content in Elasticsearch for multivalue fields
                            
                                ElasticSearch calculate percentage for each bucket from total
                            
                                How can I do scripted aggregation in Kibana + Elasticsearch?
                            
                                Locality-sensitive hashing - Elasticsearch
                            
                                How to index source code with ElasticSearch
                            
                                Elasticsearch delete duplicates
                            
                                Bulk Update on ElasticSearch using NEST
                            
                                Aggregate only matched nested object values in ElasticSearch
                            
                                How to filter an elasticsearch global aggregation?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Diversified results on Elasticsearch search

Tags:

lucene

elasticsearch

search-engine

elasticsearch-aggregation

David Mabodo

People also ask

1 Answers

MarkH

Recent Activity

Donate For Us