Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper way to filter a query with Elasticsearch? (filter vs filtered query)

I am trying work out if there is a difference between "filters" and "filtered queries" in Elasticsearch.

The two example requests below return the same results, when run against my index.

Are they actually different in some subtle way?

Is there a reason why one would be preferred over the other, in different situations?

DSL giving one top-level query, and one top-level filter:

GET /index/type/_search?_source
{
  "query": {
    "multi_match": {
      "query": "my dog has fleas",
      "fields": ["name", "keywords"]
    }
  },
  "filter": {
    "term": {"status": 2}
  }
}

DSL giving only a top-level query, using the filtered construct:

GET /index/type/_search?_source
{
  "query": {
    "filtered": {
      "query": {
        "multi_match": {
          "query": "my dog has fleas",
          "fields": ["name", "keywords"]
        }
      },
      "filter": {
        "term": {"status": 2}
      }
    }
  }
}
like image 202
billc Avatar asked Mar 18 '15 20:03

billc


People also ask

What is filter and query?

You use query filters to reduce the amount of data retrieved from the data source. Query filters decrease the time it takes to run the report and ensure that only the data relevant to the report users is saved with the document. Filters you apply to the data displayed in a report are called report filters.


1 Answers

The first example is a post_filter, which is sub-optimal from a performance perspective. Filtered queries are preferred, since the filters will be run prior to the queries. Typically, you want your filters to run first, since scoring documents is more expensive than just a boolean pass/fail. That way, your result set is cut down before you run your query on it. With a post_filter, your query is run first, the entire result set is scored, and then the filter is applied to the results.

The top-level filter directive was deprecated in 1.0, and was renamed to post_filter to clarify its purpose and usage.

the top-level filter parameter in search has been renamed to post_filter, to indicate that it should not be used as the primary way to filter search results (use a filtered query instead), but only to filter results AFTER facets/aggregations have been calculated.

http://www.elastic.co/guide/en/elasticsearch/reference/current/_search_requests.html

like image 137
Chris Heald Avatar answered Jan 04 '23 21:01

Chris Heald