Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch nested query with filter

OK, this this one will probably not be too hard for one of you super awesome ElasticSearch experts out there. I've got this nested query, and I want the nested query to be filtered on a non-nested field (status). I don't know where to put the filter. I tried putting it in a query (below) but that's not giving me the right results. Can you help me out?

{
  "aggs": {
    "status": {
      "terms": {
        "field": "status",
        "size": 0
      }
    }
  },
  "filter": {
    "nested": {
      "path": "participants",
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "user_id": 1
              }
            },
            {
              "term": {
                "archived": false
              }
            },
            {
              "term": {
                "has_unread": true
              }
            }
          ]
        }
      }
    }
  },
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must_not": [
            {
              "term": {
                "status": 8
              }
            }
          ]
        }
      }
    }
  }
}
like image 728
Cari Avatar asked May 08 '15 20:05

Cari


People also ask

What is nested type in Elasticsearch?

The nested type is a specialised version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other.

Should VS must Elasticsearch?

must means: Clauses that must match for the document to be included. should means: If these clauses match, they increase the _score ; otherwise, they have no effect. They are simply used to refine the relevance score for each document.

How do I join Elasticsearch?

Joining queriesedit Instead, Elasticsearch offers two forms of join which are designed to scale horizontally. Documents may contain fields of type nested . These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.


1 Answers

There are a couple moving pieces here:

  1. The top-level filter that you are using is a "post filter", which is intended to remove things after the aggregation(s) have processed. It's rather annoying that it exists that way, but it was deprecated back in the 0.90 days and it will be removed entirely in Elasticsearch 5.0.

    You will most likely get better performance by putting it inside of the filtered query, not to mention it sounds like that is your goal anyway.

    • The replacement for it is the more aptly named post_filter.
  2. Your nested filter's terms are not using the full path to the field, which you should be doing.

    {
      "term": {
        "user_id": 1
      }
    }
    

    Should be:

    {
      "term": {
        "participants.user_id": 1
      }
    }
    

    The same follows for the rest of the nested objects.

  3. Assuming you don't want the status to be 8, then you're doing that perfectly.

  4. Using a size of 0 in the aggregation means that you are going to get everything back. This works fine with a smaller data set, but this would be painful on a larger one.

Putting it all together (order is irrelevant, but it's generally a good idea to put aggregations after the query portion because that's how it is executed):

{
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must" : {
            "nested" : {
              "path" : "participants",
              "filter": {
                "bool": {
                  "must": [
                    {
                      "term": {
                        "participants.user_id": 1
                      }
                    },
                    {
                      "term": {
                        "participants.archived": false
                      }
                    },
                    {
                      "term": {
                        "participants.has_unread": true
                      }
                    }
                  ]
                }
              }
            }
          },
          "must_not": {
            "term": {
              "status": 8
            }
          }
        }
      }
    }
  },
  "aggs": {
    "status": {
      "terms": {
        "field": "status",
        "size": 0
      }
    }
  }
}

Note: I changed the "must_not" part from an array to a single object. There's nothing wrong with always using the array syntax, but I just did not to show that both formats work. Naturally, if you use more than one item, then you must use the array syntax.

like image 149
pickypg Avatar answered Oct 12 '22 10:10

pickypg