Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch: post_filter or filter?

Let's say I have a similar situation explained here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-post-filter.html

Before I stumbled upon this article, I have been using filter instead of post_filter for this kind of scenario, and it produced output just like the post_filter.

My question is: Are they the same thing? If not, which one is the recommended and more efficient method to use and why?

like image 696
Sum NL Avatar asked Aug 19 '15 02:08

Sum NL


People also ask

What is Elasticsearch post filter?

The post_filter is applied to the search hits at the very end of a search request, after aggregations have already been calculated.

How filter works in Elasticsearch?

Frequently used filters will be cached automatically by Elasticsearch, to speed up performance. Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.

What is term query in Elasticsearch?

Term queryedit. Returns documents that contain an exact term in a provided field. You can use the term query to find documents based on a precise value such as a price, a product ID, or a username.


2 Answers

As far as search hits are concerned, they are the same thing, i.e. the hits you get will be correctly filtered according to either your filter in a filtered query or the filter in your post_filter.

However, as far as aggregations are concerned, the end result will not be the same. The difference between both boils down to what document set the aggregations will be computed on.

If your filter is in a filtered query, then your aggregations will be computed on the document set selected by the query(ies) and the filter(s) in your filtered query, i.e. the same set of documents that you will get in the response.

If your filter is in a post_filter, then your aggregations will be computed on the document set selected by your various query(ies). Once aggregations have been computed on that document set, the latter is further filtered by the filter(s) in your post_filter before returning the matching documents.

To sum it up,

  • a filtered query affects both search results and aggregations
  • while a post_filter only affects the search results but NOT the aggregations
like image 81
Val Avatar answered Sep 22 '22 08:09

Val


Another important difference between filter and post_filter that wasn't mentioned in any of the answers: performance.

TL;DR

Don't use post_filter unless you actually need it for aggregations.

From The Definitive Guide:

WARNING: Performance consideration

Use a post_filter only if you need to differentially filter search results and aggregations. Sometimes people will use post_filter for regular searches.

Don’t do this! The nature of the post_filter means it runs after the query, so any performance benefit of filtering (such as caches) is lost completely.

The post_filter should be used only in combination with aggregations, and only when you need differential filtering.

like image 25
Todd Menier Avatar answered Sep 18 '22 08:09

Todd Menier