Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do source filtering on Nested Fields

Sample document

{
 "id" : "video1",
  "title" : "Gone with the wind",
  "timedTextLines" : [ 
    {
      "startTime" : "00:00:02",
      "endTime" :  "00:00:05",
      "textLine" : "Frankly my dear I don't give a damn."
    },
   {
      "startTime" : "00:00:07",
      "endTime" :  "00:00:09",
      "textLine" : " my amazing country."
    },
 {
      "startTime" : "00:00:17",
      "endTime" :  "00:00:29",
      "textLine" : " amazing country."
    }
  ]
}

Index Definition

{
  "mappings": {
    "video_type": {
      "properties": {
        "timedTextLines": {
          "type": "nested" 
        }
      }
    }
  }
}

Response without source filtering in inner works fine.

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.91737854,
    "hits": [
      {
        "_index": "video_index",
        "_type": "video_type",
        "_id": "1",
        "_score": 0.91737854,
        "_source": {

        },
        "inner_hits": {
          "timedTextLines": {
            "hits": {
              "total": 1,
              "max_score": 0.6296964,
              "hits": [
                {
                  "_nested": {
                    "field": "timedTextLines",
                    "offset": 0
                  },
                  "_score": 0.6296964,
                  "_source": {
                    "startTime": "00:00:02",
                    "endTime": "00:00:05",
                    "textLine": "Frankly my dear I don't give a damn."
                  },
                  "highlight": {
                    "timedTextLines.textLine": [
                      "Frankly my dear I don't give a <em>damn</em>."
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

Response contains all the properties for the nested property. viz startTime, endTime and textLine. How can I return just the endtime and startTime in the response?

Failed query

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "gone"
          }
        },
        {
          "nested": {
            "path": "timedTextLines",
            "query": {
              "match": {
                "timedTextLines.textLine": "damn"
              }
            },
            "inner_hits": {
             "_source":["startTime","endTime"],
              "highlight": {
                "fields": {
                  "timedTextLines.textLine": {

                  }
                }
              }
            }
          }
        }
      ]
    }
  },
  "_source":"false"
}

Error HTTP/1.1 400 Bad Request content-type: application/json; charset=UTF-8 content-length: 265

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"[inner_hits] _source doesn't support values of type: START_ARRAY"}],"type":"illegal_argument_exception","reason":"[inner_hits] _source doesn't support values of type: START_ARRAY"},"status":400}

like image 236
R.D Avatar asked Jan 10 '17 05:01

R.D


People also ask

How do I search in nested fields?

You can search nested fields using dot notation that includes the complete path, such as obj1.name . Multi-level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level, rather than root, if it exists within another nested query.

When to use nested type Elasticsearch?

If you need to index arrays of objects and to maintain the independence of each object in the array, use the nested data type instead of the object data type.

What is nested query in elastic search?

You can perform a nested query in Elasticsearch by using the nested parameter. A nested query will search the nested field objects and return the document's root parent if there's a matching object.

What is a nested field?

When a packed class contains an instance field that is a packed type, the data for that field is packed directly into the containing class. The field is known as a nested field . When reading from a nested field, a small object is created as a pointer to the data.


1 Answers

The reason is because since ES 5.0 the _source in inner_hits doesn't support the short form anymore, but only the full object form (with includes and excludes) (see this open issue)

Your query can be rewritten like this and it will work:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "gone"
          }
        },
        {
          "nested": {
            "path": "timedTextLines",
            "query": {
              "match": {
                "timedTextLines.textLine": "damn"
              }
            },
            "inner_hits": {
             "_source": {
                "includes":[
                  "timedTextLines.startTime",
                  "timedTextLines.endTime"
                ]
             },
              "highlight": {
                "fields": {
                  "timedTextLines.textLine": {

                  }
                }
              }
            }
          }
        }
      ]
    }
  },
  "_source":"false"
}
like image 97
Val Avatar answered Nov 13 '22 04:11

Val