Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform date arithmetic between nested and unnested dates in Elasticsearch?

Consider the following Elasticsearch (v5.4) object (an "award" doc type):

{
  "name": "Gold 1000",
  "date": "2017-06-01T16:43:00.000+00:00",
  "recipient": {
    "name": "James Conroy",
    "date_of_birth": "1991-05-30"
  }
}

The mapping type for both award.date and award.recipient.date_of_birth is "date".

I want to perform a range aggregation to get a list of the age ranges of the recipients of this award ("Under 18", "18-24", "24-30", "30+"), at the time of their award. I tried the following aggregation query:

{
  "size": 0,
  "query": {"match_all": {}},
  "aggs": {
    "recipients": {
      "nested": {
        "path": "recipient"
      },
      "aggs": {
        "age_ranges": {
          "range": {
            "script": {
              "inline": "doc['date'].date - doc['recipient.date_of_birth'].date"
            },
            "keyed": true,
            "ranges": [{
              "key": "Under 18",
              "from": 0,
              "to": 18
            }, {
              "key": "18-24",
              "from": 18,
              "to": 24
            }, {
              "key": "24-30",
              "from": 24,
              "to": 30
            }, {
              "key": "30+",
              "from": 30,
              "to": 100
            }]
          }
        }
      }
    }
  }
}

Problem 1

But I get the following error due to the comparison of dates in the script portion:

Cannot apply [-] operation to types [org.joda.time.DateTime] and [org.joda.time.MutableDateTime].

The DateTime object is the award.date field, and the MutableDateTime object is the award.recipient.date_of_birth field. I've tried doing something like doc['recipient.date_of_birth'].date.toDateTime() (which doesn't work despite the Joda docs claiming that MutableDateTime has this method inherited from a parent class). I've also tried doing something further like this:

"script": "ChronoUnit.YEARS.between(doc['date'].date, doc['recipient.date_of_birth'].date)"

Which sadly also doesn't work :(

Problem 2

I notice if I do this:

"aggs": {
  "recipients": {
    "nested": {
      "path": "recipient"
    },
    "aggs": {
      "award_years": {
        "terms": {
          "script": {
            "inline": "doc['date'].date.year"
          }
        }
      }
    }
  }
}

I get 1970 with a doc_count that happens to equal the total number of docs in ES. This leads me to believe that accessing a property outside of the nested object simply does not work and gives me back some default like the epoch datetime. And if I do the opposite (aggregating dates of birth without nesting), I get the exact same thing for all the dates of birth instead (1970, epoch datetime). So how can I compare those two dates?

I am racking my brain here, and I feel like there's some clever solution that is just beyond my current expertise with Elasticsearch. Help!

If you want to set up a quick environment for this to help me out, here is some curl goodness:

curl -XDELETE http://localhost:9200/joelinux
curl -XPUT http://localhost:9200/joelinux -d "{\"mappings\": {\"award\": {\"properties\": {\"name\": {\"type\": \"string\"}, \"date\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ\"}, \"recipient\": {\"type\": \"nested\", \"properties\": {\"name\": {\"type\": \"string\"}, \"date_of_birth\": {\"type\": \"date\", \"format\": \"yyyy-MM-dd\"}}}}}}}"
curl -XPUT http://localhost:9200/joelinux/award/1 -d '{"name": "Gold 1000", "date": "2016-06-01T16:43:00.000000+00:00", "recipient": {"name": "James Conroy", "date_of_birth": "1991-05-30"}}'
curl -XPUT http://localhost:9200/joelinux/award/2 -d '{"name": "Gold 1000", "date": "2017-02-28T13:36:00.000000+00:00", "recipient": {"name": "Martin McNealy", "date_of_birth": "1983-01-20"}}'

That should give you a "joelinux" index with two "award" docs to test this out ("James Conroy" and "Martin McNealy"). Thanks in advance!

like image 872
JoeLinux Avatar asked Jun 19 '17 14:06

JoeLinux


1 Answers

Unfortunately, you can't access nested and non-nested fields within the same context. As a workaround, you can change your mapping to automatically copy date from nested document to root context using copy_to option:

{
    "mappings": {
        "award": {
            "properties": {
                "name": {
                    "fields": {
                        "keyword": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    },
                    "type": "text"
                },
                "date": {
                    "type": "date"
                },
                "date_of_birth": {
                    "type": "date" // will be automatically filled when indexing documents
                },
                "recipient": {
                    "properties": {
                        "name": {
                            "fields": {
                                "keyword": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            },
                            "type": "text"
                        },
                        "date_of_birth": {
                            "type": "date",
                            "copy_to": "date_of_birth" // copy value to root document
                        }
                    },
                    "type": "nested"
                }
            }
        }
    }
}

After that you can access date of birth using path date, though the calculations to get number of years between dates are slightly tricky:

Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()

Here I convert original JodaTime date objects to system.time.LocalDate objects:

  1. Get number of milliseconds from 1970-01-01
  2. Convert to number of days from 1970-01-01 by dividing it to 86400000L (number of ms in one day)
  3. Convert to LocalDate object
  4. Create date-based Period object from two dates
  5. Get number of years between two dates.

So, the final aggregation query looks like this:

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "age_ranges": {
            "range": {
                "script": {
                    "inline": "Period.between(LocalDate.ofEpochDay(doc['date_of_birth'].date.getMillis() / 86400000L), LocalDate.ofEpochDay(doc['date'].date.getMillis() / 86400000L)).getYears()"
                },
                "keyed": true,
                "ranges": [
                    {
                        "key": "Under 18",
                        "from": 0,
                        "to": 18
                    },
                    {
                        "key": "18-24",
                        "from": 18,
                        "to": 24
                    },
                    {
                        "key": "24-30",
                        "from": 24,
                        "to": 30
                    },
                    {
                        "key": "30+",
                        "from": 30,
                        "to": 100
                    }
                ]
            }
        }
    }
}
like image 187
Random Avatar answered Oct 22 '22 02:10

Random