Filtering, sorting and paginating by sub-aggregations in ElasticSearch 6

Tags:

elasticsearch-aggregation

I have a collection of documents, where each document indicates the available rooms for a given hotel and day, and their cost for that day:

{
    "hotel_id": 2016021519381313,
    "day": "20200530",
    "rooms": [
        {
            "room_id": "00d70230ca0142a6874358919336e53f",
            "rate": 87
        },
        {
            "room_id": "675a5ec187274a45ae7a5fdc20f72201",
            "rate": 53
        }
    ]
}

Being the mapping:

{
    "properties": {
        "day": {
            "type": "keyword"
        },
        "hotel_id": {
            "type": "long"
        },
        "rooms": {
            "type": "nested",
            "properties": {
                "rate": {
                    "type": "long"
                },
                "room_id": {
                    "type": "keyword"
                }
            }
        }
    }
}

I am trying to figure out, how to do a query where I can get the available rooms for a set of days which total cost is less than a given amount, ordered by total cost in ascending order and paginated.

So far I came up with the way of getting rooms available for the set of days and their total cost. Basically filtering by the days, and grouping per hotel and room IDs, requiring that the minimum count in the aggregation is the number of days I am looking for.

{
    "size" : 0,
    "query": {
        "bool": { 
            "must": [
                {
                    "terms" : {
                        "day" : ["20200423", "20200424", "20200425"]
                    }
                }
            ]
        } 
    } ,
    "aggs" : {
        "hotel" : {
            "terms" : { 
                "field" : "hotel_id"
            },
            "aggs" : {
                "rooms" : {
                    "nested" : {
                        "path" : "rooms"
                    },
                    "aggs" : {
                        "rooms" : {
                            "terms" : {
                                "field" : "rooms.room_id",
                                "min_doc_count" : 3
                            },
                            "aggs" : {
                                "sum_price" : { 
                                    "sum" : { "field" : "rooms.rate" } }
                            }
                        }

                    }
                }
            }
        }
    }
}

So now I am interesting in ordering the result buckets in descending order at the "hotel" level based on the value of the sub-aggregation with "rooms", and also filtering the buckets that do not contains enough documents or which "sum_price" is bigger than a given budget. But I cannot manage how to do it.

I have been taking a look at "bucket_sort", but I cannot find the way to sort in base a subaggregation. I have been also taking a look to "bucket_selector", but it gives me empty buckets when they do not fit the predicate. I am probably not using them correctly in my case.

Which would be the right way of accomplish it?

731

asked Jul 24 '19 17:07

Vlad

Video Answer

1 Answers

Here is the query without pagination:

{
   "size":0,
   "query":{
      "bool":{
         "must":[
            {
               "terms":{
                  "day":[
                     "20200530",
                     "20200531",
                     "20200532"
                  ]
               }
            }
         ]
      }
   },
   "aggs":{
      "rooms":{
         "nested":{
            "path":"rooms"
         },
         "aggs":{
            "rooms":{
               "terms":{
                  "field":"rooms.room_id",
                  "min_doc_count":3,
                  "order":{
                     "sum_price":"asc"
                  }
               },
               "aggs":{
                  "sum_price":{
                     "sum":{
                        "field":"rooms.rate"
                     }
                  },
                  "max_price":{
                     "bucket_selector":{
                        "buckets_path":{
                           "var1":"sum_price"
                        },
                        "script":"params.var1 < 100"
                     }
                  }
               }
            }
         }
      }
   }
}

Please note that the following variables should be changed for the desired results:

day
min_doc_count
script in max_price

155

answered Oct 08 '22 12:10

damjad

Related questions
                            
                                Kafka JDBC connector load all data, then incremental
                            
                                Querying Array data type in elasticsearch using python_dsl
                            
                                Increase the maximum size of log messages
                            
                                Elastic search kubernetes Sudden rise in data disk usage
                            
                                How to search by array elements in elasticsearch?
                            
                                Elasticsearch highlight with nested objects
                            
                                debugging groovy script elasticsearch
                            
                                Searchkick boost exact matches
                            
                                elasticsearch: Keep redundant (denormalized) data or keep a list of ids for cross-referencing?
                            
                                ElasticSearch taking word order into account in match query
                            
                                Can Elasticsearch stream the SearchResponse?
                            
                                Elasticsearch - give negative boost to documents without a certain field
                            
                                How can I create a histogram of time stamp deltas?
                            
                                Searchkick results are not relevant
                            
                                Scripting in logstash
                            
                                Unable to locate nested geopoint after updating to elasticsearch 2.3
                            
                                logstach: jdbc_page_size doesn't dump all my data to elastic search
                            
                                Elasticsearch sort: by one from each group and repeat
                            
                                Stop Logstash process automatically after imported all data
                            
                                elasticsearch - decay documents using property value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With