Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return unique results in elasticsearch

I have a use case in which I have data like

{
    name: "John",
    parentid": "1234",
    filter: {a: '1', b: '3', c: '4'}
},
{
    name: "Tim",
    parentid": "2222",
    filter: {a: '2', b: '1', c: '4'}
},
{
    name: "Mary",
    parentid": "1234",
    filter: {a: '1', b: '3', c: '5'}
},
{
    name: "Tom",
    parentid": "2222",
    filter: {a: '1', b: '3', c: '1'}
}

expected results:

bucket:[{
    key: "2222",
    hits: [{
        name: "Tom" ...
    }, 
    {
        name: "Tim" ...
    }]
},
{
    key: "1234",
    hits: [{
        name: "John" ...
    },
    {
        name: "Mary" ...
    }]
}]

I want to return unique document by parentid. Although I can use top aggregation but I don't how can I paginate the bucket. As there is more chance of parentid being different than same. So mine bucket array would be large and I want to show all of them but by paginating them.

like image 451
Priyank Bhatt Avatar asked Aug 03 '16 19:08

Priyank Bhatt


People also ask

How do I get distinct values in Elasticsearch?

How can you get distinct values of a field in Elasticsearch? Elasticsearch is a powerful search engine that can be used to get distinct values of a field. To do this, you can use the "terms" aggregation. This will return a list of all the unique values of the field, in order of popularity.

How do you list unique values of a specific field in Kibana?

Set you aggregation back to count and have a Split Rows as follows. Not doing this will give you count 1 for each field value (since it is looking for unique counts) when you populate the table. Noteworthy part is setting the Top field to 0. Because Kibana won't let you enter anything else than a digit (Obviously!).

How do I make a field unique in Elasticsearch?

One solution will be to use uniqueId field value for specifying document ID and use op_type=create while storing the documents in ES. With this you can make sure your uniqueId field will have unique value and will not be overridden by another same valued document.

How do you count unique values in Kibana?

You can use Visual Builder to show the amount of duplicates by bucket. So the metric will show the amount of duplicates in the latest time interval. If you want to show a total number of duplicates, the accurate way would be to increase the bucket so much that it basically contains all the data.


1 Answers

There is no direct way of doing this. But you can follow these steps to get desired result.

Step 1. You should know all parentid. This data can be obtained by doing a simple terms aggregation (Read more here) on field parentid and you will get only the list of parentid, not the documents matching to that. In the end you will have a smaller array on than you are currently expectig.

{
  "aggs": {
    "parentids": {
      "terms": {
        "field": "parentid",
        "size": 0 
      }
    }
  }
}

size: 0 is required to return all results. Read more here.

OR

If you already know list of all parentid then you can directly move to step 2.

Step 2. Fetch related documents by filtering documents by parentid and here you can apply pagination.

{
  "from": 0,
  "size": 20, 
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "parentid": "2222"
        }
      }
    }
  }

}

from and size are used for pagination, so you can loop through each of parentid in the list and fetch all related documents.

like image 198
Sumit Avatar answered Sep 29 '22 06:09

Sumit