Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch filter aggregations on minimal doc count

I am really new to elasticsearch world.

Let's say I have a nested aggregation on two fields : field1 and field2 :

{
    ...
    aggs: {
        field1: {
            terms: {
                field: 'field1'
            },
            aggs: {
                field2: {
                    terms: {
                        field: 'field2'
                    }
                }
            }
        }
    }
}

This piece of code works perfectly and gives me something like this :

aggregations: {
    field1: {
        buckets: [{
            key: "foo",
            doc_count: 123456,
            field2: {
                buckets: [{
                    key: "bar",
                    doc_count: 34323
                },{
                    key: "baz",
                    doc_count: 10
                },{
                    key: "foobar",
                    doc_count: 36785
                },
                ...
                ]
        },{
            key: "fooOO",
            doc_count: 423424,
            field2: {
                buckets: [{
                    key: "bar",
                    doc_count: 35
                },{
                    key: "baz",
                    doc_count: 2435453
                },
                ...
                ]
        },
        ...
        ]
    }
}

Now, my need is to exclude all aggregation results where doc_count is less than 1000 for instance and get this instead :

aggregations: {
    field1: {
        buckets: [{
            key: "foo",
            doc_count: 123456,
            field2: {
                buckets: [{
                    key: "bar",
                    doc_count: 34323
                },{
                    key: "foobar",
                    doc_count: 36785
                },
                ...
                ]
        },{
            key: "fooOO",
            doc_count: 423424,
            field2: {
                buckets: [{
                    key: "baz",
                    doc_count: 2435453
                },
                ...
                ]
        },
        ...
        ]
    }
}

Is it possible to set this need in the query body ? or do I have to perform the filter in the caller layout (in javascript in my case)?

Thanks in advance

like image 614
M'sieur Toph' Avatar asked Apr 24 '15 09:04

M'sieur Toph'


People also ask

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What is sub aggregation in Elasticsearch?

The sub-aggregations will be computed for the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of another higher-level aggregation).

What is Sum_other_doc_count?

sum_other_doc_count is the number of documents that didn't make it into the the top size terms.


1 Answers

Next time, M'sieur Toph' : RTFM !!!

I feel really dumb: I found the anwser in the manual, 30 seconds after asking. I don't remove my question because, it can help, who knows...

Here is the anwser :

You can specify the min_doc_count property in the terms aggregation.

It gives you :

{
    ...
    aggs: {
        field1: {
            terms: {
                field: 'field1',
                min_doc_count: 1000
            },
            aggs: {
                field2: {
                    terms: {
                        field: 'field2',
                        min_doc_count: 1000
                    }
                }
            }
        }
    }
}

You also can specify a specific minimal count for each level of your aggregation.

What else ? :)

like image 167
M'sieur Toph' Avatar answered Oct 11 '22 13:10

M'sieur Toph'