Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elastic search sort aggregation by selected field

How can I sort the output from an aggregation by a field that is in the source data, but not part of the output of the aggregation?

In my source data I have a date field that I would like the output of the aggregation to be sorted by date.

Is that possible? I've looked at using "order" within the aggregation, but I don't think it can see that date field to use it for sorting?

I've also tried adding a sub aggregation which includes the date field, but again, I cannot get it to sort on this field.

I'm calculating a hash for each document in my ETL on the way in to elastic. My data set contains a lot of duplication, so I'm trying to use the aggregation on the hash field to filter out duplicates and that works fine. I need the output from the aggregation to retain a date sort order so that I can work with the output in angular.

The documents are like this:

{_id: 123,
_source: {
"hash": "01010101010101"
"user": "1"
"dateTime" : "2001/2/20 09:12:21"
"action": "Login"
}

{_id: 124,
_source: {
"hash": "01010101010101"
"user": "1"
"dateTime" : "2001/2/20 09:12:21"
"action": "Login"
}


{_id: 132,
_source: {
"hash": "0202020202020"
"user": "1"
"dateTime" : "2001/2/20 09:20:43"
"action": "Logout"
}

{_id: 200,
_source: {
"hash": "0303030303030303"
"user": "2"
"dateTime" : "2001/2/22 09:32:14"
"action": "Login"
}

So I want to use an aggregation on the hash value to remove duplicates from my set and then render the response in date order.

My query:

{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "action": "Login"
              }
            }
          ]
        },
        "size": 0,
        "aggs": {
          "md5": {
            "terms": {
              "field": "hash",
              "size": 0
            }
          },
          "size": 0,
          "aggs": {
            "byDate": {
              "terms": {
                "field": "dateTime",
                "size": 0
              }
            }
          }
        }
      }
    }
  }
}

Currently the output is ordered on the hash and I need it ordered on the date field within each hash bucket. Is that possible?

like image 520
A Dev Avatar asked Mar 13 '23 05:03

A Dev


1 Answers

If the aggregation on "hash" is just for removing duplicates, it might work for you to simply aggregate on "dateTime" first, followed by the terms aggregation on "hash". For example:

GET my_index/test/_search
{
  "query" : {
    "filtered" : {
      "filter" : {
        "bool": {
          "must" : [
            { "term": {"action":"Login"} }
          ]
        }
      }
    }
  },
  "size": 0,
  "aggs": {
    "byDate" : {
      "terms": {
        "field" : "dateTime",
        "order": { "_term": "asc" }   <---- EDIT: must specify order here
      },
      "aggs": {
        "byHash": {
          "terms": {
            "field": "hash"
          }
        }
      }
    }
  }
}

This way, your results would be sorted by "dateTime" first.

like image 99
BrookeB Avatar answered Mar 19 '23 11:03

BrookeB