Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering, sorting and paginating by sub-aggregations in ElasticSearch 6

I have a collection of documents, where each document indicates the available rooms for a given hotel and day, and their cost for that day:

{
    "hotel_id": 2016021519381313,
    "day": "20200530",
    "rooms": [
        {
            "room_id": "00d70230ca0142a6874358919336e53f",
            "rate": 87
        },
        {
            "room_id": "675a5ec187274a45ae7a5fdc20f72201",
            "rate": 53
        }
    ]
}

Being the mapping:

{
    "properties": {
        "day": {
            "type": "keyword"
        },
        "hotel_id": {
            "type": "long"
        },
        "rooms": {
            "type": "nested",
            "properties": {
                "rate": {
                    "type": "long"
                },
                "room_id": {
                    "type": "keyword"
                }
            }
        }
    }
}

I am trying to figure out, how to do a query where I can get the available rooms for a set of days which total cost is less than a given amount, ordered by total cost in ascending order and paginated.

So far I came up with the way of getting rooms available for the set of days and their total cost. Basically filtering by the days, and grouping per hotel and room IDs, requiring that the minimum count in the aggregation is the number of days I am looking for.

{
    "size" : 0,
    "query": {
        "bool": { 
            "must": [
                {
                    "terms" : {
                        "day" : ["20200423", "20200424", "20200425"]
                    }
                }
            ]
        } 
    } ,
    "aggs" : {
        "hotel" : {
            "terms" : { 
                "field" : "hotel_id"
            },
            "aggs" : {
                "rooms" : {
                    "nested" : {
                        "path" : "rooms"
                    },
                    "aggs" : {
                        "rooms" : {
                            "terms" : {
                                "field" : "rooms.room_id",
                                "min_doc_count" : 3
                            },
                            "aggs" : {
                                "sum_price" : { 
                                    "sum" : { "field" : "rooms.rate" } }
                            }
                        }

                    }
                }
            }
        }
    }
}

So now I am interesting in ordering the result buckets in descending order at the "hotel" level based on the value of the sub-aggregation with "rooms", and also filtering the buckets that do not contains enough documents or which "sum_price" is bigger than a given budget. But I cannot manage how to do it.

I have been taking a look at "bucket_sort", but I cannot find the way to sort in base a subaggregation. I have been also taking a look to "bucket_selector", but it gives me empty buckets when they do not fit the predicate. I am probably not using them correctly in my case.

Which would be the right way of accomplish it?

like image 731
Vlad Avatar asked Jul 24 '19 17:07

Vlad


People also ask

What is sub aggregation in Elasticsearch?

The sub-aggregations will be computed for the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of another higher-level aggregation).

What is filtered aggregate?

Filter aggregationeditA single bucket aggregation that narrows the set of documents to those that match a query. The previous example calculates the average price of all sales as well as the average price of all T-shirt sales.

What is terms aggregation in Kibana?

What is Kibana Aggregation? Aggregation refers to the collection of documents or a set of documents obtained from a particular search query or filter. Aggregation forms the main concept to build the desired visualization in Kibana.


Video Answer


1 Answers

Here is the query without pagination:

{
   "size":0,
   "query":{
      "bool":{
         "must":[
            {
               "terms":{
                  "day":[
                     "20200530",
                     "20200531",
                     "20200532"
                  ]
               }
            }
         ]
      }
   },
   "aggs":{
      "rooms":{
         "nested":{
            "path":"rooms"
         },
         "aggs":{
            "rooms":{
               "terms":{
                  "field":"rooms.room_id",
                  "min_doc_count":3,
                  "order":{
                     "sum_price":"asc"
                  }
               },
               "aggs":{
                  "sum_price":{
                     "sum":{
                        "field":"rooms.rate"
                     }
                  },
                  "max_price":{
                     "bucket_selector":{
                        "buckets_path":{
                           "var1":"sum_price"
                        },
                        "script":"params.var1 < 100"
                     }
                  }
               }
            }
         }
      }
   }
}

Please note that the following variables should be changed for the desired results:

  • day
  • min_doc_count
  • script in max_price
like image 155
damjad Avatar answered Oct 08 '22 12:10

damjad