Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying Multi Level Nested fields on Elastic Search

I'm new to Elastic Search and to the non-SQL paradigm. I've been following ES tutorial, but there is one thing I couldn't put to work.

In the following code (I'me using PyES to interact with ES) I create a single document, with a nested field (subjects), that contains another nested field (concepts).

from pyes import *

conn = ES('127.0.0.1:9200')  # Use HTTP

# Delete and Create a new index.
conn.indices.delete_index("documents-index")
conn.create_index("documents-index")

# Create a single document.
document = {
    "docid": 123456789,
    "title": "This is the doc title.",
    "description": "This is the doc description.",
    "datepublished": 2005,
    "author": ["Joe", "John", "Charles"],
    "subjects": [{
                    "subjectname": 'subject1',
                    "subjectid": [210, 311, 1012, 784, 568],
                    "subjectkey": 2,
                    "concepts": [
                                    {"name": "concept1", "score": 75},
                                    {"name": "concept2", "score": 55}
                                  ]
                },
                {
                    "subjectname": 'subject2',
                    "subjectid": [111, 300, 141, 457, 748],
                    "subjectkey": 0,
                    "concepts": [
                                    {"name": "concept3", "score": 88},
                                    {"name": "concept4", "score": 55},
                                    {"name": "concept5", "score": 66}
                                  ]
                }],
    }


# Define the nested elements.
mapping1 = {
            'subjects': {
                'type': 'nested'
            }
        }
mapping2 = {
            'concepts': {
                'type': 'nested'
            }
        }
conn.put_mapping("document", {'properties': mapping1}, ["documents-index"])
conn.put_mapping("subjects", {'properties': mapping2}, ["documents-index"])


# Insert document in 'documents-index' index.
conn.index(document, "documents-index", "document", 1)

# Refresh connection to make queries.
conn.refresh()

I'm able to query subjects nested field:

query1 = {
    "nested": {
        "path": "subjects",
        "score_mode": "avg",
        "query": {
            "bool": {
                "must": [
                    {
                        "text": {"subjects.subjectname": "subject1"}
                    },
                    {
                        "range": {"subjects.subjectkey": {"gt": 1}}
                    }
                ]
            }
        }
    }
}


results = conn.search(query=query1)
for r in results:
    print r  # as expected, it returns the entire document.

but I can't figure out how to query based on concepts nested field.

ES documentation refers that

Multi level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level (and not root) if it exists within another nested query.

So, I tryed to build a query with the following format:

query2 = {
        "nested": {
            "path": "concepts",
            "score_mode": "avg",
            "query": {
                "bool": {
                    "must": [
                        {
                            "text": {"concepts.name": "concept1"}
                        },
                        {
                           "range": {"concepts.score": {"gt": 0}}
                        }
                    ]
                }
            }
        }
}

which returned 0 results.

I can't figure out what is missing and I haven't found any example with queries based on two levels of nesting.

like image 673
JCJS Avatar asked Jan 31 '13 19:01

JCJS


2 Answers

Ok, after trying a tone of combinations, I finally got it using the following query:

query3 = {
    "nested": {
        "path": "subjects",
        "score_mode": "avg",
        "query": {
            "bool": {
                "must": [
                    {
                        "text": {"subjects.concepts.name": "concept1"}
                    }
                ]
            }
        }
    }
}

So, the nested path attribute (subjects) is always the same, no matter the nested attribute level, and in the query definition I used the attribute's full path (subject.concepts.name).

like image 52
JCJS Avatar answered Oct 06 '22 00:10

JCJS


Shot in the dark since I haven't tried this personally, but have you tried the fully qualified path to Concepts?

query2 = {
       "nested": {
           "path": "subjects.concepts",
           "score_mode": "avg",
           "query": {
               "bool": {
                   "must": [
                       {
                           "text": {"subjects.concepts.name": "concept1"}
                       },
                       {
                          "range": {"subjects.concepts.score": {"gt": 0}}
                       }
                   ]
               }
           }
       }
    } 
like image 30
Zach Avatar answered Oct 06 '22 00:10

Zach