Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fetching aggregations of nested documents in ElasticSearch 5.6.3 leads to Lucene exception

I have an issue with aggregations of nested documents on ElasticSearch 5.6.3.

My query is structured in the following way:

  query
  aggs
  |_filter
    |_nested
      |_term
        |_top-hits

If I try the aggregation on a non-nested field (and with the nested agg removed of course), everything works as expected. But as it is structured now, I receive an exception from Lucene: Child query must not match same docs with parent filter. Combine them as must clauses (+) to find a problem doc. docId=2147483647, class org.apache.lucene.search.ConstantScoreScorer

This exception is not triggered on ElasticSearch 2.4.6.

I tried to structure the aggregations in a different way, but I couldn't come up with a combination that works and delivers the wanted results.

This is how the mapping looks like:

"recording": {
  "dynamic": "strict",
  "_all" : {
    "enabled" : false
  },
  "properties": {
    "id": {
      "type": "integer"
    },
    "soloists": {
      "properties": {
        "type": "nested",
        "person": {
          "properties": {
            "id": {
             "type": "integer"
            },
            "name": {
              "type": "string",
              "index": "not_analyzed"
            }
         }
      }
    },
    "work": {
      "id": {
        "type": integer
      },
      "title": {
        "type": "string",
        "index": "not_analyzed"
      }
    }
}

And the query itself:

{
  "query": {},
  "aggs": {
    "my_top_results": {
      "global": {},
      "aggs": {
        "my_filter_agg": {
          "filter": {
            "bool": {
              "must": [
                {
                  "bool": {
                    "should": [
                      {
                        "nested": {
                          "path": "soloists",
                          "query": {
                            "bool": {
                              "must": {
                                "match": {
                                  "soloists.person.id": 77957
                                }
                              }
                            }
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          },
          "aggs": {
            "my_nested_agg": {
              "nested": {
                "path": "soloists"
              },
              "aggs": {
                "my_terms_agg": {
                  "term": {
                    "field": "soloists.person.id",
                    "size": 10
                  }
                  "aggs": {
                    "my_top_hits_agg": {
                      "size": 1,
                      "_source": {
                        "include": [
                          "soloists.person.id",
                          "soloists.person.name"
                        ]
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Any help would be highly appreciated.

Some links I stumbled across while looking for a solution:

  • https://issues.apache.org/jira/browse/LUCENE-7674
  • https://discuss.elastic.co/t/querying-on-a-subobject-field-within-a-nested-object/65533
  • https://github.com/elastic/elasticsearch/issues/23280
  • https://github.com/elastic/elasticsearch/issues/11749
like image 378
cvursache Avatar asked Oct 17 '22 03:10

cvursache


1 Answers

There are some typos in your mapping and queries:

Here are some fixed command which does not trigger any error when used on a instance of Elasticsearch 5.6.3.

You can copy and paste either in Kibana or in a Linux terminal (in which case you should edit the first line) and test them on your Elasticsearch instance.

HOST=10.225.0.2:9200

curl -XPUT "http://$HOST/an_index"

curl -XPUT "http://$HOST/an_index/recording/_mapping" -H 'Content-Type: application/json' -d'
{
  "dynamic": "strict",
  "_all": {
    "enabled": false
  },
  "properties": {
    "id": {
      "type": "integer"
    },
    "soloists": {
      "type": "nested",
      "properties": {
        "person": {
          "properties": {
            "id": {
              "type": "integer"
            },
            "name": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    },
    "work": {
      "properties": {
        "id": {
          "type": "integer"
        },
        "title": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}'

curl -XPOST "http://$HOST/an_index/recording/1" -H 'Content-Type: application/json' -d'
{
  "id": 0,
  "soloists": [
    {
      "person": {
        "id": 77957,
        "name": "John doe"
      }
    },
    {
      "person": {
        "id": 1,
        "name": "Jane smith"
      }
    }
  ],
  "work": {
    "id": 0,
    "title": "Test"
  }
}'

curl -XGET "http://$HOST/an_index/recording/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "my_top_results": {
      "global": {},
      "aggs": {
        "my_filter_agg": {
          "filter": {
            "bool": {
              "must": [
                {
                  "bool": {
                    "should": [
                      {
                        "nested": {
                          "path": "soloists",
                          "query": {
                            "bool": {
                              "must": {
                                "match": {
                                  "soloists.person.id": 77957
                                }
                              }
                            }
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          },
          "aggs": {
            "my_nested_agg": {
              "nested": {
                "path": "soloists"
              },
              "aggs": {
                "my_terms_agg": {
                  "terms": {
                    "field": "soloists.person.id",
                    "size": 10
                  },
                  "aggs": {
                    "my_top_hits_agg": {
                      "top_hits": {
                        "size": 1,
                        "_source": {
                          "include": [
                            "soloists.person.id",
                            "soloists.person.name"
                          ]
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}'

If those queries work but not when applied to your index, could you please update your question with the output of curl -XGET "http://$HOST/your_index_name" so that we can check the exact settings and mapping of your index? Such an error may be cause by conflict between type on a same index. I'll update my answer accordingly.

like image 89
Pandawan Avatar answered Oct 20 '22 16:10

Pandawan