Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch filtering aggregated results where count greater than x

I've got the following elastic search query in order to get the highest total ms from Elastic Search grouped by market id.

    {
  "from": 0,
  "size": 0,
  "query": {
  "filtered": {
    "filter": {
      "and": [
        {
          "term": {
            "@type": "tradelog"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-7d",
              "lt": "now"
            }
          }
        },
        {
          "range": {
            "TotalMs": {
              "gte": 200,
              "lt": 2000
            }
          }
        }
      ]
    }
  }

},
"aggregations": {
      "the_name": {
         "terms": {
            "field": "Market",
            "order" : { "totalms_avg" : "desc" }
         },
         "aggregations": {
            "totalms_avg": {
               "avg": {
                  "field": "TotalMs"
               }
            }
         }
      }
   }
}

This query returns several buckets that only have 1 result which are outliers in my data so I do not want them to be included. Is it possible to filter out any buckets with a count of less than 5? The elastic search equivalent to SQLs 'HAVING' clause.

like image 213
Kevin Holditch Avatar asked Aug 12 '16 14:08

Kevin Holditch


People also ask

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What is Sum_other_doc_count?

sum_other_doc_count is the number of documents that didn't make it into the the top size terms.

What is sub aggregation in Elasticsearch?

This allows you to set up a range of criteria and sub-criteria with buckets, then place metrics to calculate values for your reports about each criteria.


1 Answers

Yes, you can use the min_doc_count setting

...
"aggregations": {
      "the_name": {
         "terms": {
            "field": "Market",
            "order" : { "totalms_avg" : "desc" },
            "min_doc_count": 5
         },
         "aggregations": {
            "totalms_avg": {
               "avg": {
                  "field": "TotalMs"
               }
            }
         }
      }
   }
}
like image 170
Val Avatar answered Oct 21 '22 19:10

Val