Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicate search result in elasticsearch?

First Create some example data (e1,e2,e3 are types and test is the index name):

PUT test/e1/1
{
  "id":1
  "subject": "subject 1"
}
PUT test/e2/1
{
  "id":1
  "subject": "subject 2"
}
PUT test/e3/2
{
  "id":2
  "subject": "subject 3"
}

Now my question is: how can I get just these two data? remove duplicate data with the same id in the curl -XGET _search result.

test/e1/1
{
  "id":1
  "subject": "subject 1"
}
test/e3/2
{
  "id":2
  "subject": "subject 3"
}
like image 452
navins Avatar asked Apr 27 '15 03:04

navins


People also ask

How do I stop Elasticsearch duplicates?

Elasticsearch is a powerful search engine that can be used to search for documents and other data stored in an index. One way to avoid duplicates in Elasticsearch is to use the "dedup" processor, which will remove duplicate documents from the search results.


1 Answers

First you will need to search across multiple index.
Then, on the result remove the duplicate ID.

POST  http://myElastic.com/test/e1,e2,e3/_search
{
  "aggs":{
    "dedup" : {
      "terms":{
        "field": "id"
       },
       "aggs":{
         "dedup_docs":{
           "top_hits":{
             "size":1
           }
         }
       }    
    }
  }
}

This might help you:

  • search multi-index type
  • Remove duplicate documents from a search in Elasticsearch
  • Filter elasticsearch results to contain only unique documents based on one field value
like image 162
Francois Combet Avatar answered Sep 17 '22 19:09

Francois Combet