Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch: how to scope aggregations to your query and filter?

I have been playing around with elasticsearch query and filter for some time now but never worked with aggregations before. The idea that we can scope the aggregations with our query seems quite amazing to me but I want to understand how to do it properly so that I do not make any mistakes. Currently all my search queries are designed this way:

{
    "query": {

    },
    "filter": {

    },
    "from": 0,
    "size": 60
}

Now, when I added some aggregation buckets, the structure became this:

{
    "aggs": {
      "all_colors": {
        "terms": {
          "field": "color.name"
        }
      },
      "all_brands": {
        "terms": {
          "field": "brand_slug"
        }
      },
      "all_sizes": {
        "terms": {
          "field": "sizes"
        }
      }
    },
    "query": {

    },
    "filter": {

    },
    "from": 0,
    "size": 60
}

However, the results of the aggregation are always the same irrespective of what info I provide in filter.

Now, when I changed the query structure to something like this, it started showing different results:

{
    "aggs": {
      "all_colors": {
        "terms": {
          "field": "color.name"
        }
      },
      "all_brands": {
        "terms": {
          "field": "brand_slug"
        }
      },
      "all_sizes": {
        "terms": {
          "field": "sizes"
        }
      }
    },
    "query": {
        "filtered": {
            "query": {

            },
            "filter": {

            }        
        }
    },
    "from": 0,
    "size": 60
}

Does it mean I will have to change the structure of my search queries everywhere to this new filtered type of structure ? Is there any other workaround which allows me to achieve desired results without having to change that much of code ?

Also, another thing I observed is that if my brand_slug field contains multiple keywords like "peter england", then both of these are returned in separate buckets like this:

{
    "buckets": [
        {
           "key": "england",
           "doc_count": 368
        },
        {
           "key": "peter",
           "doc_count": 368
        }
    ]
}

How can I ensure that both these end up in a same bucket like this:

{
    "buckets": [
        {
           "key": "peter england",
           "doc_count": 368
        }
    ]
}

UPDATE: This second part I have been able to accomplish by indexing brand, color and sizes differently like this:

"sizes": {
    "type": "string",
    "fields": {
        "raw": {
            "type": "string",
            "index": "not_analyzed"
        }
    }
}
like image 453
Mandeep Singh Avatar asked Aug 29 '15 13:08

Mandeep Singh


People also ask

How do Elasticsearch aggregations work?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What is filter aggregation?

Defines a multi bucket aggregation where each bucket is associated with a filter. Each bucket will collect all documents that match its associated filter.


1 Answers

What you've noticed is by design. Have a look at my answer to a similar question on SO. Basically, input to both aggregation and filter sections is the output of query section. Filtered Query as you've suggested would be the best way to achieve the results you desire. There is another way too. You can use Filter Aggregation. Then you would not need to change your query and filter sections but simply copy the filter section inside the aggregation sections but that in my opinion would be an overkill and a violation of the DRY principle in general.

like image 145
bittusarkar Avatar answered Sep 20 '22 13:09

bittusarkar