Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch sort based on element in array that satisfies filter

My types have a field which is an array of times in ISO 8601 format. I want to get all the listing's which have a time on a certain day, and then order them by the earliest time they occur on that specific day. Problem is my query is ordering based on the earliest time of all days.

You can reproduce the problem below.

curl -XPUT 'localhost:9200/listings?pretty'

curl -XPOST 'localhost:9200/listings/listing/_bulk?pretty' -d '
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "times": ["2018-12-05T12:00:00","2018-12-06T11:00:00"] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "times": ["2018-12-05T10:00:00","2018-12-06T12:00:00"] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "times": ["2018-12-05T11:00:00","2018-12-06T10:00:00"] }
'

# because ES takes time to add them to index 
sleep 2

echo "Query listings on the 6th!"

curl -XPOST 'localhost:9200/listings/_search?pretty' -d '
{
  "sort": {
    "times": {
      "order": "asc",
      "nested_filter": {
        "range": {
          "times": {
            "gte": "2018-12-06T00:00:00",
            "lte": "2018-12-06T23:59:59"
          }
        }
      }
    }
  },
  "query": {
    "bool": {
      "filter": {
        "range": {
          "times": {
            "gte": "2018-12-06T00:00:00",
            "lte": "2018-12-06T23:59:59"
          }
        }
      }
    }
  }
}'

curl -XDELETE 'localhost:9200/listings?pretty'

Adding the above script to a .sh file and running it helps reproduce the issue. You'll see the order is happening based on the 5th and not the 6th. Elasticsearch converts the times to a epoch_millis number for sorting, you can see the epoch number in the sort field in the hits object e.g 1544007600000. When doing an asc sort, in takes the smallest number in the array (order not important) and sorts based off that.

Somehow I need it to be ordered on the earliest time that occurs on the queried day i.e the 6th.

Currently using Elasticsearch 2.4 but even if someone can show me how it's done in the current version that would be great.

Here is their doc on nested queries and scripting if that helps.

like image 217
Albert Still Avatar asked Oct 16 '22 09:10

Albert Still


1 Answers

I think the problem here is that the nested sorting is meant for nested objects, not for arrays.

If you convert the document into one that uses an array of nested objects instead of the simple array of dates, then you can construct a nested filtered sort that works.

The following is Elasticsearch 6.0 - they're changed the syntax a bit for 6.1 onwards, and I'm not sure how much of this works with 2.x:

Mappings:

PUT nested-listings
{
  "mappings": {
    "listing": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "openTimes": {
          "type": "nested",
          "properties": {
            "date": {
              "type": "date"
            }
          }
        }
      }
    }
  }
}

Data:

POST nested-listings/listing/_bulk
{"index": { } }
{ "name": "second on 6th (3rd on the 5th)", "openTimes": [ { "date": "2018-12-05T12:00:00" }, { "date": "2018-12-06T11:00:00" }] }
{"index": { } }
{ "name": "third on 6th (1st on the 5th)", "openTimes": [ {"date": "2018-12-05T10:00:00"}, { "date": "2018-12-06T12:00:00" }] }
{"index": { } }
{ "name": "first on the 6th (2nd on the 5th)", "openTimes": [ {"date": "2018-12-05T11:00:00" }, { "date": "2018-12-06T10:00:00" }] }

So instead of the "nextNexpectionOpenTimes", we have an "openTimes" nested object, and each listing contains an array of openTimes.

Now the search:

POST nested-listings/_search
{
  "sort": {
    "openTimes.date": {
      "order": "asc",
      "nested_path": "openTimes",
      "nested_filter": {
        "range": {
          "openTimes.date": {
            "gte": "2018-12-06T00:00:00",
            "lte": "2018-12-06T23:59:59"
          }
        }
      }
    }
  },
  "query": {
    "nested": {
      "path": "openTimes",
      "query": {
        "bool": {
          "filter": {
            "range": {
              "openTimes.date": {
                "gte": "2018-12-06T00:00:00",
                "lte": "2018-12-06T23:59:59"
              }
            }
          }
        }
      }
    }
  }
}

The main difference here is the slightly different query, since you need to use a "nested" query to filter on nested objects.

And this gives the following result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "nested-listings",
        "_type": "listing",
        "_id": "vHH6e2cB28sphqox2Dcm",
        "_score": null,
        "_source": {
          "name": "first on the 6th (2nd on the 5th)"
        },
        "sort": [
          1544090400000
        ]
      },
      {
        "_index": "nested-listings",
        "_type": "listing",
        "_id": "unH6e2cB28sphqox2Dcm",
        "_score": null,
        "_source": {
          "name": "second on 6th (3rd on the 5th)"
        },
        "sort": [
          1544094000000
        ]
      },
      {
        "_index": "nested-listings",
        "_type": "listing",
        "_id": "u3H6e2cB28sphqox2Dcm",
        "_score": null,
        "_source": {
          "name": "third on 6th (1st on the 5th)"
        },
        "sort": [
          1544097600000
        ]
      }
    ]
  }
}

I don't think you can actually select a single value from an array in ES, so for sorting, you were always going to be sorting on all the results. The best you can do with a plain array is choose how you treat that array for sorting purposes (use lowest, highest, mean, etc).

like image 76
Stuart Herring Avatar answered Oct 21 '22 03:10

Stuart Herring