Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch - Filter where (one of nested array) and (all of nested array)

TL;DR - How do I check whether one-of and all-of a nested array meet specified criteria?

I have a document. Each document has an array of nested outer objects, who themselves have a list of nested inner objects. I need to perform a filter for all documents where at least one of the document's outer nested objects match. When I say match, I mean that all the outer nested objects' inner objects match in some way. Here's an example mapping for reference;

{ "document" : {
    "properties" : {
      "name" : {
        "type" : "string"
      },
      "outer" : {
        "type" : "nested",
        "properties" : {
          "inner" : {
            "type" : "nested",
            "properties" : {
              "match" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "type" : {
                "type" : "string",
                "index" : "not_analyzed"
              }
    }}}}}}
}

If the document has no outer/inner objects it is considered to match. But to make things worse the inner objects need to be considered to match differently depending on the type in a kind of conditional logic manner (eg CASE in SQL). For example, if the type were the term "Country" then inner object would be considered to match if the match were a specified country code such as ES. A document may have inner objects of varying type and there is not guarantee that specific types will exist.

Coming from a imperative (Java) programming background I am having incredible trouble figuring out how to implement this kind of filtering. Nothing I can think of even vaguely matches this behaviour. Thus far all I have is the filtered query;

"filtered" : {
      "query" : {
        "match_all" : { }
      },
      "filter" : {
        "bool" : {
          "should" : {
            "missing" : {
              "field" : "outer.inner.type"
            }
    }}}}
}

So, the question is...

How can I filter to documents who have at least one outer object which has all inner objects matching based on the type of inner object?

Further details By Request -

Example Document JSON

{
    "name":"First",
    "outer":[
        {
            "inner":[
                {"match":"ES","type":"Country"},
                {"match":"Elite","type":"Market"}
            ]
        },{
            "inner":[
                {"match":"GBR","type":"Country"},
                {"match":"1st Class","type":"Market"},
                {"match":"Admin","type":"Role"}
            ]
        }
    ],
    "lockVersion":0,"sourceId":"1"
}

The above example should come through the filter if we were to provide "1st Class" market and the country "GRB" because the second of the two outer objects would be considered a match because both inner objects match. If, however, we provided the country country "GRB" and the market "Elite" then we would not have this document returned because neither of the outer objects would have bother of their inner objects match in their entirety. If we wanted the second outer object to match then all three inner would need to match. Take note that there is an extra type in the third inner. This leads to a situation where if a type exists then it needs to have a match for it else it doesn't need to match because it is absent.

like image 224
Rudi Kershaw Avatar asked Sep 16 '15 12:09

Rudi Kershaw


People also ask

What is nested mapping in Elasticsearch?

The nested type is a specialised version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other.

How does filter work in Elasticsearch?

Frequently used filters will be cached automatically by Elasticsearch, to speed up performance. Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.

What is a nested field?

When a packed class contains an instance field that is a packed type, the data for that field is packed directly into the containing class. The field is known as a nested field .


1 Answers

One of Nested Array

Having one of a nested array matching some criteria turns out to be very simple. A nested filter evaluates to matching/true if any of the array of nested objects match the specified inner filters. For example, given an array of outer objects where one of those objects has a field match with the value "matching" the following would be considered true.

"nested": {
   "path": "outer",
   "filter": {
       "term" : { "match" : "matching" } 
   }
}

The above will be considered true/matching if one of the nested outer objects has a field called match with the value "matching".

All of Nested Array

Having a nested filter only be considered matching if all of the nested objects in an array match is more interesting. In fact, it's impossible. But given that it is considered matching if only one of the nested objects match a filter we can reverse the logic and say "If none of the nested objects don't match" to achieve what we need. For example, given an array of nested outer.inner objects where all of those objects has a field match with the value "matching" the following would be considered true.

"not" : {
   "nested": {
      "path": "outer.inner",
      "filter": {
          "not" : {
              "term" : { "match" : "matching" } 
          }
      }
   }
}

The above will be considered true/matching because none of the nested outer.inner objects don't (double negative) have a field called match with the value "matching". This, of course, is the same as all of the nested inner objects having a field match with the value "matching".

Missing Any Nested Objects

You can't check whether a field containing nested objects is missing using the traditional missing filter. This is because nested objects aren't actually in the document at all, they are stored somewhere else. As such missing filters will always be considered true. What you can do however, is check that a match_all filter returns no results like so;

"not": {
   "nested": {
      "path": "outer",
      "filter": {
          "match_all": {}
       }
    }
 }

This is considered true/matching if match_all finds no results.

like image 65
Rudi Kershaw Avatar answered Sep 16 '22 15:09

Rudi Kershaw