Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch - comparing a nested field with another field in the document

I need to compare 2 fields in the same document where the actual value does not matter. Consider this document:

_source: {
    id: 123,
    primary_content_type_id: 12,
    content: [
        {
            id: 4,
            content_type_id: 1
            assigned: true
        },
        {
            id: 5,
            content_type_id: 12,
            assigned: false
        }
    ]
}

I need to find all documents in which the primary content is not assigned. I cannot find a way to compare the primary_content_type_id to the nested content.content_type_id to assure they are the same value. This is what I have tried using a script. I do not think I understand scripts but that may be a way to solve this problem:

{
    "filter": {
        "nested": {
            "path": "content",
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "content.assigned": false
                            }
                        },
                        {
                            "script": {
                                "script": "primary_content_type_id==content.content_type_id"
                            }
                        }
                    ]
                }
            }
        }
    }
}

Note that it works fine if I remove the script portion of the filter and replace it with another term filter where the content_type_id = 12 and also add another filter where the primary_content_id = 12. The problem is that I will not know (nor does it matter for my use case) what the values of primary_content_type_id or content.content_type_id are. It just matters that the assigned is false for the content where the content_type_id matches the primary_content_type_id.

Is this check possible with elasticsearch?

like image 806
Spencer Avatar asked Nov 20 '14 21:11

Spencer


People also ask

How do I search in nested fields?

You can search nested fields using dot notation that includes the complete path, such as obj1.name . Multi-level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level, rather than root, if it exists within another nested query.

What is nested mapping in Elasticsearch?

The nested type is a specialised version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other.

What is nested document?

Embedded document or nested documents are those types of documents which contain a document inside another document.


1 Answers

In the case of the nested search, you are searching the nested objects without the parent. Unfortunately, there is no hidden join that you can apply with nested objects.

At least currently, that means you do not receive both the "parent" and the nested document in the script. You can confirm this by replacing your script with both of these and testing the result:

# Parent Document does not exist
"script": {
  "script": "doc['primary_content_type_id'].value == 12"
}

# Nested Document should exist
"script": {
  "script": "doc['content.content_type_id'].value == 12"
}

You could do this in a performance-inferior way by looping across objects (rather than inherently having ES do this for you with nested). This means that you would have to reindex your documents and nested documents as a single document for this to work. Considering the way that you are trying to use it, this probably wouldn't be too different and it may even perform better (especially given the lack of an alternative).

# This assumes that your default scripting language is Groovy (default in 1.4)
# Note1: "find" will loop across all of the values, but it will
#  appropriately short circuit if it finds any!
# Note2: It would be preferable to use doc throughout, but since we need the
#  arrays (plural!) to be in the _same_ order, then we need to parse the
#  _source. This inherently means that you must _store_ the _source, which
#  is the default. Parsing the _source only happens on the first touch.
"script": {
  "script": "_source.content.find { it.content_type_id == _source.primary_content_type_id && ! it.assigned } != null",
  "_cache" : true
}

I cached the result because nothing dynamic is occurring here (e.g., not comparing dates to now for instance), so it's pretty safe to cache, thereby making future lookups much faster. Most filters are cached by default, but scripts are one of the few exceptions.

Since it must compare both values to be sure that it found the correct inner object, you are duplicating some amount of work, but it's practically unavoidable. Having the term filter is most likely going to be superior to just doing this check without it.

like image 111
pickypg Avatar answered Sep 20 '22 15:09

pickypg