Is there any difference between "query and filter in filtered" and "query and filter on the root"? for example
Case 1:
{ "query":{ "filtered":{ "query":{ "term":{"title":"kitchen3"} }, "filter":{ "term":{"price":1000} } } } }
Case 2:
{ "query":{ "term":{"title":"kitchen3"} }, "filter":{ "term":{"price":1000} } }
I found this discussion http://elasticsearch-users.115913.n3.nabble.com/Filtered-query-vs-using-filter-outside-td3960119.html, but referenced URL is 404 and the explanation is a bit too concise for me.
Please teach or give any document which is pointing the difference between these, thank you.
You use query filters to reduce the amount of data retrieved from the data source. Query filters decrease the time it takes to run the report and ensure that only the data relevant to the report users is saved with the document. Filters you apply to the data displayed in a report are called report filters.
Frequently used filters will be cached automatically by Elasticsearch, to speed up performance. Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.
Basically, a query is used when you want to perform a search on your documents with scoring. And filters are used to narrow down the set of results obtained by using query. Filters are boolean. For example say you have an index of restaurants something like zomato.
The difference is related to performance. "filter" on top level is always executed after the query. This means the query is executed on all documents, score is computed for all documents etc. - and only then documents not matching filter are excluded.
With "filtered" query there is a possibility that ES will optimize this computation, e.g. first executing the filter, then executing query on a limited set of documents, saving time on testing the documents that don't match the filter against the query, and on computing scores for them if they do match the query.
If you are performing multiple queries with same filter, then there are even more advantages: the filter may be cached, improving performance of each query even further. This applies to your example: "term" filters are cached by default.
You also can explicitly control the execution of "filtered" query (see the documentation) to optimize it for your particular use case.
The filters in the two types can be referred as pre and post filters also. As @alexey explained, root level filter is performed after query and filter in filtered query is performed before the query.
In addition you need to understand the impact of the same other then the order they are executed. The filter in "filtered" query comes under the query scope which means that while calculating aggregations the filtered output will be considered while in case of the root level filter aggregations will be performed only on the results of the query excluding the filter. Though in both case the result documents will be same.
For example with the two queries you have posted, both will give same results, but if you are performing aggregations also the first query will calculate aggregation count from documents matching title kitchen3 and price 10000 while the second query will calculate aggregation count from documents matching title kitchen3 only without filter of price 1000.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With